Tarea - Ciencias de Datos espaciales¶

Estudiante: Pierina Milla¶
In [ ]:
from IPython.display import IFrame  
ciaLink1="https://www.cia.gov/the-world-factbook/field/carbon-dioxide-emissions/country-comparison" 
IFrame(ciaLink1, width=900, height=900)

PARTE 1¶

In [129]:
# read web table into pandas DF
import pandas as pd

linkToFile='https://github.com/CienciaDeDatosEspacial/code_and_data/raw/main/data/carbonEmi_downloaded.csv'
carbon=pd.read_csv(linkToFile)
In [130]:
# here it is:
carbon
Out[130]:
name slug value date_of_information ranking region
0 China china 10,773,248,000.0 2019 est. 1 East and Southeast Asia
1 United States united-states 5,144,361,000.0 2019 est. 2 North America
2 India india 2,314,738,000.0 2019 est. 3 South Asia
3 Russia russia 1,848,070,000.0 2019 est. 4 Central Asia
4 Japan japan 1,103,234,000.0 2019 est. 5 East and Southeast Asia
... ... ... ... ... ... ...
213 Antarctica antarctica 28,000.0 2019 est. 214 Antarctica
214 Saint Helena, Ascension, and Tristan da Cunha saint-helena-ascension-and-tristan-da-cunha 13,000.0 2019 est. 215 Africa
215 Niue niue 8,000.0 2019 est. 216 Australia and Oceania
216 Northern Mariana Islands northern-mariana-islands 0.0 2019 est. 217 Australia and Oceania
217 Tuvalu tuvalu 0.0 2019 est. 218 Australia and Oceania

218 rows × 6 columns

1. Keep the columns name, value, date_of_information and region.¶

Tip: use drop, loc, and iloc for the same purpose (three ways to accomplish the task).

In [136]:
# I want to eliminate slug and ranking
# First, I make a copy of my csv file to save the first version
carbon_new=carbon.copy()
1.1 Using drop¶
In [132]:
byeColumns=['slug','ranking'] # you can delete more than one

#this is the result
carbon_new.drop(columns=byeColumns,inplace=True) # here inplace modify carbon_new and it doesn't create other DataFrame
#then
carbon_new
Out[132]:
name value date_of_information region
0 China 10,773,248,000.0 2019 est. East and Southeast Asia
1 United States 5,144,361,000.0 2019 est. North America
2 India 2,314,738,000.0 2019 est. South Asia
3 Russia 1,848,070,000.0 2019 est. Central Asia
4 Japan 1,103,234,000.0 2019 est. East and Southeast Asia
... ... ... ... ...
213 Antarctica 28,000.0 2019 est. Antarctica
214 Saint Helena, Ascension, and Tristan da Cunha 13,000.0 2019 est. Africa
215 Niue 8,000.0 2019 est. Australia and Oceania
216 Northern Mariana Islands 0.0 2019 est. Australia and Oceania
217 Tuvalu 0.0 2019 est. Australia and Oceania

218 rows × 4 columns

1.2 Using loc¶
In [133]:
carbon_new=carbon_new.loc[:, ~carbon_new.columns.isin(['slug','ranking'])]
In [24]:
carbon_new
Out[24]:
name value date_of_information region
0 China 10,773,248,000.0 2019 est. East and Southeast Asia
1 United States 5,144,361,000.0 2019 est. North America
2 India 2,314,738,000.0 2019 est. South Asia
3 Russia 1,848,070,000.0 2019 est. Central Asia
4 Japan 1,103,234,000.0 2019 est. East and Southeast Asia
... ... ... ... ...
213 Antarctica 28,000.0 2019 est. Antarctica
214 Saint Helena, Ascension, and Tristan da Cunha 13,000.0 2019 est. Africa
215 Niue 8,000.0 2019 est. Australia and Oceania
216 Northern Mariana Islands 0.0 2019 est. Australia and Oceania
217 Tuvalu 0.0 2019 est. Australia and Oceania

218 rows × 4 columns

1.3 Using iloc¶
In [137]:
#accessing by list of comprehension
carbon_new = carbon_new.iloc[:, [j for j in range(len(carbon_new.columns)) if j not in [1, 4]]]
In [138]:
carbon_new
Out[138]:
name value date_of_information region
0 China 10,773,248,000.0 2019 est. East and Southeast Asia
1 United States 5,144,361,000.0 2019 est. North America
2 India 2,314,738,000.0 2019 est. South Asia
3 Russia 1,848,070,000.0 2019 est. Central Asia
4 Japan 1,103,234,000.0 2019 est. East and Southeast Asia
... ... ... ... ...
213 Antarctica 28,000.0 2019 est. Antarctica
214 Saint Helena, Ascension, and Tristan da Cunha 13,000.0 2019 est. Africa
215 Niue 8,000.0 2019 est. Australia and Oceania
216 Northern Mariana Islands 0.0 2019 est. Australia and Oceania
217 Tuvalu 0.0 2019 est. Australia and Oceania

218 rows × 4 columns

2. Change the column name date_of_information to carbon_date.¶

Tip: Use rename.

In [139]:
carbon_new.rename(columns={'date_of_information':'carbon_date'}, inplace=True)
In [141]:
carbon_new
Out[141]:
name value carbon_date region
0 China 10,773,248,000.0 2019 est. East and Southeast Asia
1 United States 5,144,361,000.0 2019 est. North America
2 India 2,314,738,000.0 2019 est. South Asia
3 Russia 1,848,070,000.0 2019 est. Central Asia
4 Japan 1,103,234,000.0 2019 est. East and Southeast Asia
... ... ... ... ...
213 Antarctica 28,000.0 2019 est. Antarctica
214 Saint Helena, Ascension, and Tristan da Cunha 13,000.0 2019 est. Africa
215 Niue 8,000.0 2019 est. Australia and Oceania
216 Northern Mariana Islands 0.0 2019 est. Australia and Oceania
217 Tuvalu 0.0 2019 est. Australia and Oceania

218 rows × 4 columns

3. Make sure the cells with text does not have neither trailing nor leading spaces.¶

Tip: use strip.

In [35]:
carbon_new.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 218 entries, 0 to 217
Data columns (total 4 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   name                 218 non-null    object
 1   value                218 non-null    object
 2   date_of_information  218 non-null    object
 3   region               218 non-null    object
dtypes: object(4)
memory usage: 6.9+ KB
In [17]:
#This is for seeing the cells which has trailing and leading spaces
carbon_new.region.to_list()  #change region to other column name
Out[17]:
['East and Southeast Asia',
 'North America',
 'South Asia',
 'Central Asia',
 'East and Southeast Asia',
 'Europe',
 'East and Southeast Asia',
 'Middle East',
 'North America',
 'Middle East',
 'East and Southeast Asia',
 'Africa',
 'North America',
 'South America',
 'Australia and Oceania',
 'Europe',
 'Middle East',
 'Europe',
 'Europe',
 'East and Southeast Asia',
 'Europe',
 'Europe',
 'East and Southeast Asia',
 'Middle East',
 'Central Asia',
 'East and Southeast Asia',
 'East and Southeast Asia',
 'East and Southeast Asia',
 'Africa',
 'Europe',
 'South Asia',
 'South America',
 'Europe',
 'Africa',
 'Middle East',
 'East and Southeast Asia',
 'Europe',
 'Middle East',
 'Central Asia',
 'Africa',
 'South America',
 'Central Asia',
 'Europe',
 'South Asia',
 'Middle East',
 'East and Southeast Asia',
 'South America',
 'South America',
 'Middle East',
 'Europe',
 'Europe',
 'Europe',
 'Middle East',
 'Africa',
 'South America',
 'Europe',
 'Europe',
 'Europe',
 'Europe',
 'Europe',
 'Middle East',
 'Europe',
 'East and Southeast Asia',
 'Australia and Oceania',
 'Central America and the Caribbean',
 'Europe',
 'Europe',
 'Africa',
 'Europe',
 'South America',
 'Europe',
 'Middle East',
 'Europe',
 'Europe',
 'East and Southeast Asia',
 'Middle East',
 'Central America and the Caribbean',
 'Middle East',
 'Central America and the Caribbean',
 'South Asia',
 'Africa',
 'Middle East',
 'East and Southeast Asia',
 'Africa',
 'Central America and the Caribbean',
 'Central America and the Caribbean',
 'East and Southeast Asia',
 'Africa',
 'South America',
 'Africa',
 'Africa',
 'Africa',
 'Europe',
 'Central America and the Caribbean',
 'Europe',
 'Europe',
 'Europe',
 'East and Southeast Asia',
 'Europe',
 'Africa',
 'Africa',
 'Europe',
 'Africa',
 'Middle East',
 'Middle East',
 'Central America and the Caribbean',
 'East and Southeast Asia',
 'Europe',
 'Central America and the Caribbean',
 'Europe',
 'Central America and the Caribbean',
 'Europe',
 'Europe',
 'Europe',
 'Africa',
 'South America',
 'South Asia',
 'Central Asia',
 'Africa',
 'South Asia',
 'Central Asia',
 'Central America and the Caribbean',
 'Europe',
 'Africa',
 'Africa',
 'Africa',
 'Africa',
 'East and Southeast Asia',
 'South America',
 'Middle East',
 'Africa',
 'Australia and Oceania',
 'Africa',
 'Africa',
 'Europe',
 'Central America and the Caribbean',
 'Africa',
 'Africa',
 'Africa',
 'Africa',
 'Africa',
 'Central America and the Caribbean',
 'Africa',
 'Europe',
 'Middle East',
 'Middle East',
 'Europe',
 'Central America and the Caribbean',
 'Africa',
 'South America',
 'Africa',
 'Africa',
 'Europe',
 'Central America and the Caribbean',
 'Africa',
 'South America',
 'South Asia',
 'Africa',
 'East and Southeast Asia',
 'Australia and Oceania',
 'Africa',
 'Africa',
 'Central America and the Caribbean',
 'Australia and Oceania',
 'Africa',
 'Africa',
 'Australia and Oceania',
 'Central America and the Caribbean',
 'Africa',
 'Africa',
 'Africa',
 'Australia and Oceania',
 'Africa',
 'South Asia',
 'Africa',
 'Africa',
 'Europe',
 'Central America and the Caribbean',
 'Africa',
 'North America',
 'Africa',
 'Central America and the Caribbean',
 'Africa',
 'Central America and the Caribbean',
 'Africa',
 'Africa',
 'North America',
 'Central America and the Caribbean',
 'East and Southeast Asia',
 'Europe',
 'Australia and Oceania',
 'Australia and Oceania',
 'Africa',
 'Australia and Oceania',
 'Africa',
 'Central America and the Caribbean',
 'Australia and Oceania',
 'Africa',
 'Central America and the Caribbean',
 'Central America and the Caribbean',
 'Australia and Oceania',
 'Central America and the Caribbean',
 'Central America and the Caribbean',
 'Australia and Oceania',
 'Central America and the Caribbean',
 'Africa',
 'Australia and Oceania',
 'Australia and Oceania',
 'North America',
 'Australia and Oceania',
 'Australia and Oceania',
 'South America',
 'Central America and the Caribbean',
 'Antarctica',
 'Africa',
 'Australia and Oceania',
 'Australia and Oceania',
 'Australia and Oceania']
In [142]:
carbon_new.name.str.strip()
carbon_new.value.str.strip()
carbon_new.carbon_date.str.strip()
carbon_new.region.str.strip()
Out[142]:
0      East and Southeast Asia
1                North America
2                   South Asia
3                 Central Asia
4      East and Southeast Asia
                ...           
213                 Antarctica
214                     Africa
215      Australia and Oceania
216      Australia and Oceania
217      Australia and Oceania
Name: region, Length: 218, dtype: object
In [19]:
#Here we corroborate the strings
carbon_new.region.to_list()
Out[19]:
['East and Southeast Asia',
 'North America',
 'South Asia',
 'Central Asia',
 'East and Southeast Asia',
 'Europe',
 'East and Southeast Asia',
 'Middle East',
 'North America',
 'Middle East',
 'East and Southeast Asia',
 'Africa',
 'North America',
 'South America',
 'Australia and Oceania',
 'Europe',
 'Middle East',
 'Europe',
 'Europe',
 'East and Southeast Asia',
 'Europe',
 'Europe',
 'East and Southeast Asia',
 'Middle East',
 'Central Asia',
 'East and Southeast Asia',
 'East and Southeast Asia',
 'East and Southeast Asia',
 'Africa',
 'Europe',
 'South Asia',
 'South America',
 'Europe',
 'Africa',
 'Middle East',
 'East and Southeast Asia',
 'Europe',
 'Middle East',
 'Central Asia',
 'Africa',
 'South America',
 'Central Asia',
 'Europe',
 'South Asia',
 'Middle East',
 'East and Southeast Asia',
 'South America',
 'South America',
 'Middle East',
 'Europe',
 'Europe',
 'Europe',
 'Middle East',
 'Africa',
 'South America',
 'Europe',
 'Europe',
 'Europe',
 'Europe',
 'Europe',
 'Middle East',
 'Europe',
 'East and Southeast Asia',
 'Australia and Oceania',
 'Central America and the Caribbean',
 'Europe',
 'Europe',
 'Africa',
 'Europe',
 'South America',
 'Europe',
 'Middle East',
 'Europe',
 'Europe',
 'East and Southeast Asia',
 'Middle East',
 'Central America and the Caribbean',
 'Middle East',
 'Central America and the Caribbean',
 'South Asia',
 'Africa',
 'Middle East',
 'East and Southeast Asia',
 'Africa',
 'Central America and the Caribbean',
 'Central America and the Caribbean',
 'East and Southeast Asia',
 'Africa',
 'South America',
 'Africa',
 'Africa',
 'Africa',
 'Europe',
 'Central America and the Caribbean',
 'Europe',
 'Europe',
 'Europe',
 'East and Southeast Asia',
 'Europe',
 'Africa',
 'Africa',
 'Europe',
 'Africa',
 'Middle East',
 'Middle East',
 'Central America and the Caribbean',
 'East and Southeast Asia',
 'Europe',
 'Central America and the Caribbean',
 'Europe',
 'Central America and the Caribbean',
 'Europe',
 'Europe',
 'Europe',
 'Africa',
 'South America',
 'South Asia',
 'Central Asia',
 'Africa',
 'South Asia',
 'Central Asia',
 'Central America and the Caribbean',
 'Europe',
 'Africa',
 'Africa',
 'Africa',
 'Africa',
 'East and Southeast Asia',
 'South America',
 'Middle East',
 'Africa',
 'Australia and Oceania',
 'Africa',
 'Africa',
 'Europe',
 'Central America and the Caribbean',
 'Africa',
 'Africa',
 'Africa',
 'Africa',
 'Africa',
 'Central America and the Caribbean',
 'Africa',
 'Europe',
 'Middle East',
 'Middle East',
 'Europe',
 'Central America and the Caribbean',
 'Africa',
 'South America',
 'Africa',
 'Africa',
 'Europe',
 'Central America and the Caribbean',
 'Africa',
 'South America',
 'South Asia',
 'Africa',
 'East and Southeast Asia',
 'Australia and Oceania',
 'Africa',
 'Africa',
 'Central America and the Caribbean',
 'Australia and Oceania',
 'Africa',
 'Africa',
 'Australia and Oceania',
 'Central America and the Caribbean',
 'Africa',
 'Africa',
 'Africa',
 'Australia and Oceania',
 'Africa',
 'South Asia',
 'Africa',
 'Africa',
 'Europe',
 'Central America and the Caribbean',
 'Africa',
 'North America',
 'Africa',
 'Central America and the Caribbean',
 'Africa',
 'Central America and the Caribbean',
 'Africa',
 'Africa',
 'North America',
 'Central America and the Caribbean',
 'East and Southeast Asia',
 'Europe',
 'Australia and Oceania',
 'Australia and Oceania',
 'Africa',
 'Australia and Oceania',
 'Africa',
 'Central America and the Caribbean',
 'Australia and Oceania',
 'Africa',
 'Central America and the Caribbean',
 'Central America and the Caribbean',
 'Australia and Oceania',
 'Central America and the Caribbean',
 'Central America and the Caribbean',
 'Australia and Oceania',
 'Central America and the Caribbean',
 'Africa',
 'Australia and Oceania',
 'Australia and Oceania',
 'North America',
 'Australia and Oceania',
 'Australia and Oceania',
 'South America',
 'Central America and the Caribbean',
 'Antarctica',
 'Africa',
 'Australia and Oceania',
 'Australia and Oceania',
 'Australia and Oceania']

4. Detect the presence of symbols in the numeric data that are not numeric or point.¶

Tip: Use contains.

4.1 Solving carbon_date¶
In [38]:
carbon_new = carbon_new.copy()
In [143]:
carbon_new
Out[143]:
name value carbon_date region
0 China 10,773,248,000.0 2019 est. East and Southeast Asia
1 United States 5,144,361,000.0 2019 est. North America
2 India 2,314,738,000.0 2019 est. South Asia
3 Russia 1,848,070,000.0 2019 est. Central Asia
4 Japan 1,103,234,000.0 2019 est. East and Southeast Asia
... ... ... ... ...
213 Antarctica 28,000.0 2019 est. Antarctica
214 Saint Helena, Ascension, and Tristan da Cunha 13,000.0 2019 est. Africa
215 Niue 8,000.0 2019 est. Australia and Oceania
216 Northern Mariana Islands 0.0 2019 est. Australia and Oceania
217 Tuvalu 0.0 2019 est. Australia and Oceania

218 rows × 4 columns

In [144]:
# is there a cell where you have symbols beyond [^ ] alphanumeric (\w) or points (\.)? 
carbon_new.carbon_date[carbon_new.carbon_date.str.contains(pat=r'[^\w\.]',regex=True)]
Out[144]:
0      2019 est.
1      2019 est.
2      2019 est.
3      2019 est.
4      2019 est.
         ...    
213    2019 est.
214    2019 est.
215    2019 est.
216    2019 est.
217    2019 est.
Name: carbon_date, Length: 218, dtype: object
4.2 Solving value¶
In [145]:
carbon_new.value[carbon_new.value.str.contains(pat=r'[^\w\.]',regex=True)]
Out[145]:
0      10,773,248,000.0
1       5,144,361,000.0
2       2,314,738,000.0
3       1,848,070,000.0
4       1,103,234,000.0
             ...       
211            46,000.0
212            33,000.0
213            28,000.0
214            13,000.0
215             8,000.0
Name: value, Length: 216, dtype: object

5. Make sure there are no spaces as part of the column names.¶

Tip: use replace.

In [146]:
carbon_new.columns.str.contains(' ')
Out[146]:
array([False, False, False, False])
In [147]:
carbon_new.columns[carbon_new.columns.str.contains(' ')]
Out[147]:
Index([], dtype='object')

6. Get rid of any value detected in the previous step:¶

Tip: use replace.

In [148]:
carbon_new.columns = carbon_new.columns.str.replace(' ', '_')
In [149]:
carbon_new
Out[149]:
name value carbon_date region
0 China 10,773,248,000.0 2019 est. East and Southeast Asia
1 United States 5,144,361,000.0 2019 est. North America
2 India 2,314,738,000.0 2019 est. South Asia
3 Russia 1,848,070,000.0 2019 est. Central Asia
4 Japan 1,103,234,000.0 2019 est. East and Southeast Asia
... ... ... ... ...
213 Antarctica 28,000.0 2019 est. Antarctica
214 Saint Helena, Ascension, and Tristan da Cunha 13,000.0 2019 est. Africa
215 Niue 8,000.0 2019 est. Australia and Oceania
216 Northern Mariana Islands 0.0 2019 est. Australia and Oceania
217 Tuvalu 0.0 2019 est. Australia and Oceania

218 rows × 4 columns

7. Keep only the year value in the column carbon_date.¶

Tip: use extract.

In [153]:
#Separate in to parts: numeric and string
carbon_new.carbon_date=carbon_new.carbon_date.str.replace(pat= r'[^0-9]', repl= '',regex=True)
In [154]:
carbon_new
Out[154]:
name value carbon_date region
0 China 10,773,248,000.0 2019 East and Southeast Asia
1 United States 5,144,361,000.0 2019 North America
2 India 2,314,738,000.0 2019 South Asia
3 Russia 1,848,070,000.0 2019 Central Asia
4 Japan 1,103,234,000.0 2019 East and Southeast Asia
... ... ... ... ...
213 Antarctica 28,000.0 2019 Antarctica
214 Saint Helena, Ascension, and Tristan da Cunha 13,000.0 2019 Africa
215 Niue 8,000.0 2019 Australia and Oceania
216 Northern Mariana Islands 0.0 2019 Australia and Oceania
217 Tuvalu 0.0 2019 Australia and Oceania

218 rows × 4 columns

PARTE 2¶

  • Exercise 2: Scrape the data on Revenue from forest resources.
In [66]:
from IPython.display import IFrame  
ciaLink2="https://www.cia.gov/the-world-factbook/field/revenue-from-forest-resources/country-comparison" 
IFrame(ciaLink2, width=900, height=900)
Out[66]:
In [77]:
# read web table into pandas DF
import pandas as pd

forestDFs=pd.read_html(ciaLink2, # link
                        header=0, # where is the header? # significa que la primera fila sera el encabezado de nombres
                        flavor='bs4')
In [99]:
forest=forestDFs[0].copy()
In [79]:
forest
Out[79]:
Rank Country % of GDP Date of Information
0 1 Solomon Islands 20.27 2018 est.
1 2 Liberia 13.27 2018 est.
2 3 Burundi 10.31 2018 est.
3 4 Guinea-Bissau 9.24 2018 est.
4 5 Central African Republic 8.99 2018 est.
... ... ... ... ...
199 200 Guam 0.00 2018 est.
200 201 Faroe Islands 0.00 2017 est.
201 202 Aruba 0.00 2017 est.
202 203 Virgin Islands 0.00 2017 est.
203 204 Macau 0.00 2018 est.

204 rows × 4 columns

1. Replace '%' by 'pct'.¶

Tip: use replace.

In [80]:
forest.rename(columns={'% of GDP': 'pct of GDP'}, inplace=True)
forest
Out[80]:
Rank Country pct of GDP Date of Information
0 1 Solomon Islands 20.27 2018 est.
1 2 Liberia 13.27 2018 est.
2 3 Burundi 10.31 2018 est.
3 4 Guinea-Bissau 9.24 2018 est.
4 5 Central African Republic 8.99 2018 est.
... ... ... ... ...
199 200 Guam 0.00 2018 est.
200 201 Faroe Islands 0.00 2017 est.
201 202 Aruba 0.00 2017 est.
202 203 Virgin Islands 0.00 2017 est.
203 204 Macau 0.00 2018 est.

204 rows × 4 columns

In [90]:
forest.columns=forest.columns.str.replace('% of GDP','pct of GDP')
In [91]:
forest
Out[91]:
Rank Country pct of GDP Date of Information
0 1 Solomon Islands 20.27 2018 est.
1 2 Liberia 13.27 2018 est.
2 3 Burundi 10.31 2018 est.
3 4 Guinea-Bissau 9.24 2018 est.
4 5 Central African Republic 8.99 2018 est.
... ... ... ... ...
199 200 Guam 0.00 2018 est.
200 201 Faroe Islands 0.00 2017 est.
201 202 Aruba 0.00 2017 est.
202 203 Virgin Islands 0.00 2017 est.
203 204 Macau 0.00 2018 est.

204 rows × 4 columns

2. Keep the columns Country, pct of GDP, and Date of Information.¶

Tip: use drop, loc, and iloc for the same purpose (three ways to accomplish the task).

2.1 Using drop¶
In [92]:
#this is the result
forest.drop(columns='Rank',inplace=True) # here inplace modify carbon_new and it doesn't create other DataFrame
#then
forest
Out[92]:
Country pct of GDP Date of Information
0 Solomon Islands 20.27 2018 est.
1 Liberia 13.27 2018 est.
2 Burundi 10.31 2018 est.
3 Guinea-Bissau 9.24 2018 est.
4 Central African Republic 8.99 2018 est.
... ... ... ...
199 Guam 0.00 2018 est.
200 Faroe Islands 0.00 2017 est.
201 Aruba 0.00 2017 est.
202 Virgin Islands 0.00 2017 est.
203 Macau 0.00 2018 est.

204 rows × 3 columns

2.2 Using loc¶
In [95]:
forest=forest.loc[:, ~forest.columns.isin(['Rank'])]
In [96]:
forest
Out[96]:
Country % of GDP Date of Information
0 Solomon Islands 20.27 2018 est.
1 Liberia 13.27 2018 est.
2 Burundi 10.31 2018 est.
3 Guinea-Bissau 9.24 2018 est.
4 Central African Republic 8.99 2018 est.
... ... ... ...
199 Guam 0.00 2018 est.
200 Faroe Islands 0.00 2017 est.
201 Aruba 0.00 2017 est.
202 Virgin Islands 0.00 2017 est.
203 Macau 0.00 2018 est.

204 rows × 3 columns

2.3 Using iloc¶
In [120]:
forest=forestDFs[0].copy()
In [104]:
forest.iloc[:, [j for j in range(len(forest.columns)) if j not in [0]]]
Out[104]:
Country % of GDP Date of Information
0 Solomon Islands 20.27 2018 est.
1 Liberia 13.27 2018 est.
2 Burundi 10.31 2018 est.
3 Guinea-Bissau 9.24 2018 est.
4 Central African Republic 8.99 2018 est.
... ... ... ...
199 Guam 0.00 2018 est.
200 Faroe Islands 0.00 2017 est.
201 Aruba 0.00 2017 est.
202 Virgin Islands 0.00 2017 est.
203 Macau 0.00 2018 est.

204 rows × 3 columns

3. Change the column name Date of Information to forest_date.¶

Tip: Use rename.

In [121]:
forest.rename(columns={'Date of Information':'forest_date'}, inplace=True)
forest
Out[121]:
Rank Country % of GDP forest_date
0 1 Solomon Islands 20.27 2018 est.
1 2 Liberia 13.27 2018 est.
2 3 Burundi 10.31 2018 est.
3 4 Guinea-Bissau 9.24 2018 est.
4 5 Central African Republic 8.99 2018 est.
... ... ... ... ...
199 200 Guam 0.00 2018 est.
200 201 Faroe Islands 0.00 2017 est.
201 202 Aruba 0.00 2017 est.
202 203 Virgin Islands 0.00 2017 est.
203 204 Macau 0.00 2018 est.

204 rows × 4 columns

4. Make sure there are no spaces as part of the column names.¶

Tip: use replace.

In [122]:
forest.columns.str.contains(' ')
Out[122]:
array([False, False,  True, False])

5. Make sure the cells with text does not have neither trailing nor leading spaces.¶

Tip: use strip.

In [123]:
forest.columns.str.strip()
Out[123]:
Index(['Rank', 'Country', '% of GDP', 'forest_date'], dtype='object')
In [124]:
forest.columns.to_list()
Out[124]:
['Rank', 'Country', '% of GDP', 'forest_date']

6. Keep only the year value in the column forest_date.¶

Tip: use extract.

In [127]:
forest.forest_date=forest.forest_date.str.replace(pat= r'[^0-9]', repl= '',regex=True)
In [128]:
forest
Out[128]:
Rank Country % of GDP forest_date
0 1 Solomon Islands 20.27 2018
1 2 Liberia 13.27 2018
2 3 Burundi 10.31 2018
3 4 Guinea-Bissau 9.24 2018
4 5 Central African Republic 8.99 2018
... ... ... ... ...
199 200 Guam 0.00 2018
200 201 Faroe Islands 0.00 2017
201 202 Aruba 0.00 2017
202 203 Virgin Islands 0.00 2017
203 204 Macau 0.00 2018

204 rows × 4 columns

In [ ]: