Evaluación 1 - Introducción a Ciencia de Datos Espaciales¶

Estudiante: Pierina Milla

1. Merge de las 3 fuentes seleccionadas¶

Importo la librería Pandas y leo los 3 archivos con la data que han sido subidos al repositorio Github.

  1. Emisiones de CO2 sin LULUCF
    • Unidad: Kilotones
    • Fuente: United Nations Framework Convention on Climate Change
    • Año: 1990
  2. Precipitation
    • Unidad: Milímetros
    • Fuente: World Metereological Organization
    • Periodo: 1958-1990
  3. Solar electricity
    • Unidad: millones Kilowatt-hora
    • Fuente: United Nations Statistics Division
    • Año: 2020
In [300]:
import pandas as pd

co2Link='https://github.com/IntroSpatialDataScience/ClimateChangeData/raw/main/data/CO2_1990.csv'
precipitationLink = 'https://github.com/IntroSpatialDataScience/ClimateChangeData/raw/main/data/precipitation_mean.csv'
solarElectricityLink = 'https://github.com/IntroSpatialDataScience/ClimateChangeData/raw/main/data/solar_energy.csv'
co2=pd.read_csv(co2Link)
precipitation=pd.read_csv(precipitationLink)
solar = pd.read_csv(solarElectricityLink)

Realizo una copia de cada dataframe para trabajar sin modificar el archivo original

In [301]:
co2_new=co2.copy()
precipitation_new=precipitation.copy()
solar_new=solar.copy()

Cambio los nombres de las columnas referentes al territorio o país de cada DF para facilitar el merge entre ellos.

In [302]:
co2_new.rename(columns={'Country or Area':'Country'}, inplace=True)
precipitation_new.rename(columns={'Country or Territory':'Country'}, inplace=True)
solar_new.rename(columns={'Country or Area':'Country'}, inplace=True)

Toda la data en el mismo formato 'title':

In [303]:
precipitation_new.Country = precipitation_new.Country.str.title()
co2_new.Country = co2_new.Country.str.title()
solar_new.Country = solar_new.Country.str.title()

El número de filas y columnas de cada dataframe es:

In [304]:
co2_new.shape,precipitation_new.shape,solar_new.shape
Out[304]:
((43, 3), (4263, 33), (205, 5))
In [305]:
co2_new.columns,precipitation_new.columns,solar_new.columns
Out[305]:
(Index(['Country', 'Year', 'Value'], dtype='object'),
 Index(['Country', 'Station Name', 'Period', 'Statistic Description', 'Unit',
        'Jan', 'Jan Footnotes', 'Feb', 'Feb Footnotes', 'Mar', 'Mar Footnotes',
        'Apr', 'Apr Footnotes', 'May', 'May Footnotes', 'Jun', 'Jun Footnotes',
        'Jul', 'Jul Footnotes', 'Aug', 'Aug Footnotes', 'Sep', 'Sep Footnotes',
        'Oct', 'Oct Footnotes', 'Nov', 'Nov Footnotes', 'Dec', 'Dec Footnotes',
        'Annual', 'Annual Footnotes', 'Annual NCDC Computed Value',
        'Annual NCDC Computed Value Footnotes'],
       dtype='object'),
 Index(['Country', 'Year', 'Unit', 'Quantity', 'Quantity Footnotes'], dtype='object'))

Fuzzy Merge¶

In [306]:
#Países que se encuentran solo en CO2 dataframe y no en solarDataframe
OnlyCO2=set(co2_new.Country)-set(solar_new.Country)
OnlyCO2
Out[306]:
{'European Union', 'Iceland', 'Monaco', 'United States Of America'}
In [307]:
#Países que se encuentran solo en solar dataframe y no en CO2 dataframe
OnlySolar=set(solar_new.Country)-set(co2_new.Country)
OnlySolar
Out[307]:
{'1',
 'Afghanistan',
 'Albania',
 'Algeria',
 'American Samoa',
 'Andorra',
 'Angola',
 'Anguilla',
 'Antigua And Barbuda',
 'Argentina',
 'Armenia',
 'Aruba',
 'Azerbaijan',
 'Bahamas',
 'Bahrain',
 'Bangladesh',
 'Barbados',
 'Belize',
 'Benin',
 'Bermuda',
 'Bhutan',
 'Bolivia (Plur. State Of)',
 'Bonaire, St Eustatius, Saba',
 'Bosnia And Herzegovina',
 'Botswana',
 'Brazil',
 'British Virgin Islands',
 'Brunei Darussalam',
 'Burkina Faso',
 'Burundi',
 'Cabo Verde',
 'Cambodia',
 'Cameroon',
 'Central African Rep.',
 'Chile',
 'China',
 'Colombia',
 'Congo',
 'Cook Islands',
 'Costa Rica',
 'Cuba',
 'Curaçao',
 "Côte D'Ivoire",
 'Dem. Rep. Of The Congo',
 'Djibouti',
 'Dominica',
 'Dominican Republic',
 'Ecuador',
 'Egypt',
 'El Salvador',
 'Eritrea',
 'Eswatini',
 'Ethiopia',
 'Faeroe Islands',
 'Falkland Is. (Malvinas)',
 'Fiji',
 'Fnseqid',
 'French Polynesia',
 'Gabon',
 'Gambia',
 'Ghana',
 'Grenada',
 'Guam',
 'Guatemala',
 'Guernsey',
 'Guinea-Bissau',
 'Guyana',
 'Haiti',
 'Honduras',
 'India',
 'Indonesia',
 'Iran (Islamic Rep. Of)',
 'Iraq',
 'Israel',
 'Jamaica',
 'Jordan',
 'Kazakhstan',
 'Kenya',
 'Kiribati',
 "Korea, Dem.Ppl'S.Rep.",
 'Korea, Republic Of',
 'Kosovo',
 'Kuwait',
 "Lao People'S Dem. Rep.",
 'Lebanon',
 'Liberia',
 'Libya',
 'Madagascar',
 'Malawi',
 'Malaysia',
 'Maldives',
 'Mali',
 'Marshall Islands',
 'Mauritania',
 'Mauritius',
 'Mexico',
 'Micronesia (Fed. States Of)',
 'Mongolia',
 'Montenegro',
 'Montserrat',
 'Morocco',
 'Mozambique',
 'Myanmar',
 'Namibia',
 'Nauru',
 'Nepal',
 'New Caledonia',
 'Nicaragua',
 'Niger',
 'Nigeria',
 'Niue',
 'North Macedonia',
 'Northern Mariana Islands',
 'Oman',
 'Other Asia',
 'Pakistan',
 'Palau',
 'Panama',
 'Papua New Guinea',
 'Peru',
 'Philippines',
 'Puerto Rico',
 'Republic Of Moldova',
 'Rwanda',
 'Samoa',
 'Sao Tome And Principe',
 'Saudi Arabia',
 'Senegal',
 'Serbia',
 'Seychelles',
 'Sierra Leone',
 'Singapore',
 'Solomon Islands',
 'Somalia',
 'South Africa',
 'South Sudan',
 'Sri Lanka',
 'St. Helena And Depend.',
 'St. Kitts-Nevis',
 'St. Lucia',
 'St. Vincent-Grenadines',
 'State Of Palestine',
 'Sudan',
 'Suriname',
 'Thailand',
 'Timor-Leste',
 'Togo',
 'Tonga',
 'Trinidad And Tobago',
 'Tunisia',
 'Turks And Caicos Islands',
 'Tuvalu',
 'Uganda',
 'United Arab Emirates',
 'United Rep. Of Tanzania',
 'United States',
 'United States Virgin Is.',
 'Uruguay',
 'Uzbekistan',
 'Vanuatu',
 'Venezuela (Bolivar. Rep.)',
 'Viet Nam',
 'Wallis And Futuna Is.',
 'Yemen',
 'Zambia',
 'Zimbabwe'}
In [308]:
from thefuzz import process as fz

# Tomamos un pais de OnlySolar
# Miramos en un país de OnlyCO2 y retornamos el más similar
[(f,fz.extractOne(f, OnlySolar)) for f in sorted(OnlyCO2)]
Out[308]:
[('European Union', ('Benin', 60)),
 ('Iceland', ('Thailand', 67)),
 ('Monaco', ('Morocco', 62)),
 ('United States Of America', ('United States', 90))]

Selecciono los mejores matches:

In [309]:
match_CO2_Solar = [(f,fz.extractOne(f, OnlySolar)) for f in sorted(OnlyCO2)
 if fz.extractOne(f, OnlySolar)[1]>=87]
match_CO2_Solar= {f:fz.extractOne(f, OnlySolar)[0] 
                 for f in sorted(OnlyCO2)
                 if fz.extractOne(f, OnlySolar)[1] >=87}
match_CO2_Solar
Out[309]:
{'United States Of America': 'United States'}

Reemplazo los nuevos nombres en el dataframe de CO2

In [310]:
co2_new.Country.replace(to_replace=match_CO2_Solar,inplace=True)
co2_new
Out[310]:
Country Year Value
0 Australia 1990 2.781542e+05
1 Austria 1990 6.214525e+04
2 Belarus 1990 1.083451e+05
3 Belgium 1990 1.202926e+05
4 Bulgaria 1990 7.669920e+04
5 Canada 1990 4.582183e+05
6 Croatia 1990 2.297979e+04
7 Cyprus 1990 4.653217e+03
8 Czechia 1990 1.642107e+05
9 Denmark 1990 5.487931e+04
10 Estonia 1990 3.692221e+04
11 European Union 1990 4.470262e+06
12 Finland 1990 5.691434e+04
13 France 1990 4.009646e+05
14 Germany 1990 1.051979e+06
15 Greece 1990 8.343804e+04
16 Hungary 1990 7.322554e+04
17 Iceland 1990 2.215856e+03
18 Ireland 1990 3.294442e+04
19 Italy 1990 4.395498e+05
20 Japan 1990 1.158129e+06
21 Latvia 1990 1.966140e+04
22 Liechtenstein 1990 1.989709e+02
23 Lithuania 1990 3.576773e+04
24 Luxembourg 1990 1.182335e+04
25 Malta 1990 2.394194e+03
26 Monaco 1990 9.822834e+01
27 Netherlands 1990 1.618069e+05
28 New Zealand 1990 2.550251e+04
29 Norway 1990 3.509663e+04
30 Poland 1990 3.768136e+05
31 Portugal 1990 4.532509e+04
32 Romania 1990 1.734639e+05
33 Russian Federation 1990 2.534865e+06
34 Slovakia 1990 6.147019e+04
35 Slovenia 1990 1.509485e+04
36 Spain 1990 2.313284e+05
37 Sweden 1990 5.758009e+04
38 Switzerland 1990 4.415988e+04
39 Türkiye 1990 1.516645e+05
40 Ukraine 1990 7.058300e+05
41 United Kingdom 1990 6.026521e+05
42 United States 1990 5.122496e+06

Merge entre dataframe de CO2 y SolarEnergy manteniendo lo que es común en ambas columnas

In [311]:
co2_solar = co2_new.merge(solar_new,how='inner',left_on='Country',right_on='Country')
co2_solar
Out[311]:
Country Year_x Value Year_y Unit Quantity Quantity Footnotes
0 Australia 1990 2.781542e+05 2020 Kilowatt-hours, million 21033.194 NaN
1 Austria 1990 6.214525e+04 2020 Kilowatt-hours, million 2042.934 NaN
2 Belarus 1990 1.083451e+05 2020 Kilowatt-hours, million 176.000 NaN
3 Belgium 1990 1.202926e+05 2020 Kilowatt-hours, million 5105.400 NaN
4 Bulgaria 1990 7.669920e+04 2020 Kilowatt-hours, million 1480.856 NaN
5 Canada 1990 4.582183e+05 2020 Kilowatt-hours, million 4846.000 NaN
6 Croatia 1990 2.297979e+04 2020 Kilowatt-hours, million 95.500 NaN
7 Cyprus 1990 4.653217e+03 2020 Kilowatt-hours, million 295.608 NaN
8 Czechia 1990 1.642107e+05 2020 Kilowatt-hours, million 2287.014 NaN
9 Denmark 1990 5.487931e+04 2020 Kilowatt-hours, million 1180.532 NaN
10 Estonia 1990 3.692221e+04 2020 Kilowatt-hours, million 122.540 NaN
11 Finland 1990 5.691434e+04 2020 Kilowatt-hours, million 218.471 NaN
12 France 1990 4.009646e+05 2020 Kilowatt-hours, million 13398.392 NaN
13 Germany 1990 1.051979e+06 2020 Kilowatt-hours, million 48641.000 NaN
14 Greece 1990 8.343804e+04 2020 Kilowatt-hours, million 4446.854 NaN
15 Hungary 1990 7.322554e+04 2020 Kilowatt-hours, million 2459.000 NaN
16 Ireland 1990 3.294442e+04 2020 Kilowatt-hours, million 63.780 NaN
17 Italy 1990 4.395498e+05 2020 Kilowatt-hours, million 24941.504 NaN
18 Japan 1990 1.158129e+06 2020 Kilowatt-hours, million 79086.858 NaN
19 Latvia 1990 1.966140e+04 2020 Kilowatt-hours, million 4.847 NaN
20 Liechtenstein 1990 1.989709e+02 2020 Kilowatt-hours, million 29.700 NaN
21 Lithuania 1990 3.576773e+04 2020 Kilowatt-hours, million 128.800 NaN
22 Luxembourg 1990 1.182335e+04 2020 Kilowatt-hours, million 161.261 NaN
23 Malta 1990 2.394194e+03 2020 Kilowatt-hours, million 236.837 NaN
24 Netherlands 1990 1.618069e+05 2020 Kilowatt-hours, million 8765.172 NaN
25 New Zealand 1990 2.550251e+04 2020 Kilowatt-hours, million 160.216 NaN
26 Norway 1990 3.509663e+04 2020 Kilowatt-hours, million 26.521 NaN
27 Poland 1990 3.768136e+05 2020 Kilowatt-hours, million 1957.916 NaN
28 Portugal 1990 4.532509e+04 2020 Kilowatt-hours, million 1715.590 NaN
29 Romania 1990 1.734639e+05 2020 Kilowatt-hours, million 1733.375 NaN
30 Russian Federation 1990 2.534865e+06 2020 Kilowatt-hours, million 2022.400 NaN
31 Slovakia 1990 6.147019e+04 2020 Kilowatt-hours, million 663.000 NaN
32 Slovenia 1990 1.509485e+04 2020 Kilowatt-hours, million 368.196 NaN
33 Spain 1990 2.313284e+05 2020 Kilowatt-hours, million 20667.000 NaN
34 Sweden 1990 5.758009e+04 2020 Kilowatt-hours, million 1051.000 NaN
35 Switzerland 1990 4.415988e+04 2020 Kilowatt-hours, million 2598.659 NaN
36 Türkiye 1990 1.516645e+05 2020 Kilowatt-hours, million 10950.180 NaN
37 Ukraine 1990 7.058300e+05 2020 Kilowatt-hours, million 5969.400 NaN
38 United Kingdom 1990 6.026521e+05 2020 Kilowatt-hours, million 13157.994 NaN
39 United States 1990 5.122496e+06 2020 Kilowatt-hours, million 119328.543 NaN
In [312]:
OnlyCO2_Solar=set(co2_solar.Country)-set(precipitation_new.Country)
OnlyCO2_Solar
Out[312]:
{'Czechia',
 'Liechtenstein',
 'Russian Federation',
 'Türkiye',
 'United Kingdom',
 'United States'}
In [313]:
OnlyPrep=set(precipitation_new.Country)-set(co2_solar.Country)
OnlyPrep
Out[313]:
{'1',
 '2',
 '3',
 'Afghanistan, Islamic State Of',
 'Albania',
 'Algeria',
 'Argentina',
 'Argentina (Antarctic Stations)',
 'Armenia',
 'Australia (Antarctic Stations)',
 'Azerbaijan',
 'Bahamas',
 'Bahrain',
 'Belize',
 'Benin',
 'Bosnia And Herzegovina',
 'Brazil',
 'Brunei Darussalam',
 'Cameroon',
 'Cape Verde',
 'Chad',
 'Chile',
 'Chile (Antarctic Stations)',
 'China',
 'Colombia',
 'Colombia (San Andres And Providencia Islands)',
 'Costa Rica',
 'Cuba',
 'Czech Republic',
 'Djibouti',
 'Dominica',
 'Dominican Republic',
 'Ecuador',
 'Egypt',
 'El Salvador',
 'Eritrea',
 'Fiji',
 'Footnoteseqid',
 'France (Caribbean Islands, Guadeloupe, Martinique)',
 'France (French Department Of Guyana)',
 'France (Islands In The Indian Ocean)',
 'French Polynesia',
 'Gabon',
 'Georgia',
 'Greenland',
 'Guinea',
 'Guyana',
 'Honduras',
 'Hong Kong',
 'Iceland',
 'India',
 'Iran, Islamic Republic Of',
 'Israel',
 'Ivory Coast',
 'Jordan',
 'Kazakstan (Asia)',
 'Kazakstan (Europe)',
 'Kenya',
 "Korea, Democratic People'S Republic Of",
 'Korea, Republic Of',
 'Kuwait',
 'Kyrgyz Republic',
 "Lao People'S Democratic Republic",
 'Lebanon',
 'Madagascar',
 'Malawi',
 'Malaysia',
 'Maldives',
 'Mali',
 'Mauritius',
 'Mexico',
 'Moldova, Republic Of',
 'Mongolia',
 'Morocco',
 'Myanmar',
 'Netherlands Antilles And Aruba',
 'New Caledonia',
 'Nicaragua',
 'Niger',
 'Nigeria',
 'Oman',
 'Pakistan',
 'Paraguay',
 'Peru',
 'Philippines',
 'Portugal (Madeira)',
 'Qatar',
 'Russian Federation (Asia)',
 'Russian Federation (Europe)',
 'Rwanda',
 'Saudi Arabia',
 'Senegal',
 'Seychelles',
 'Sierra Leone',
 'Singapore',
 'Solomon Islands',
 'South Africa',
 'Spain (Canary Islands, Ceuta And Melilla)',
 'Sri Lanka',
 'Sudan',
 'Syrian Arab Republic',
 'Tajikistan',
 'Tanzania, United Republic Of',
 'Thailand',
 'The Former Yugoslav Republic Of Macedonia',
 'Togo',
 'Trinidad And Tobago',
 'Tunisia',
 'Turkey',
 'Turkmenistan',
 'United Arab Emirates',
 'United Kingdom Of Great Britain & Northern Ireland',
 'United States Of America',
 'United States Of America (Antarctic Stations)',
 'United States Of America (Pacific Islands)',
 'Uruguay',
 'Uzbekistan',
 'Venezuela',
 'Yugoslavia',
 'Zambia',
 'Zimbabwe'}
In [314]:
[(f,fz.extractOne(f, OnlyCO2_Solar)) for f in sorted(OnlyPrep)]
Out[314]:
[('1', ('United Kingdom', 0)),
 ('2', ('United Kingdom', 0)),
 ('3', ('United Kingdom', 0)),
 ('Afghanistan, Islamic State Of', ('United States', 52)),
 ('Albania', ('Liechtenstein', 40)),
 ('Algeria', ('Czechia', 43)),
 ('Argentina', ('Russian Federation', 51)),
 ('Argentina (Antarctic Stations)', ('United States', 52)),
 ('Armenia', ('Czechia', 43)),
 ('Australia (Antarctic Stations)', ('United States', 52)),
 ('Azerbaijan', ('Russian Federation', 53)),
 ('Bahamas', ('United States', 33)),
 ('Bahrain', ('Russian Federation', 55)),
 ('Belize', ('Liechtenstein', 60)),
 ('Benin', ('Liechtenstein', 68)),
 ('Bosnia And Herzegovina', ('Russian Federation', 45)),
 ('Brazil', ('Russian Federation', 45)),
 ('Brunei Darussalam', ('Russian Federation', 43)),
 ('Cameroon', ('Russian Federation', 48)),
 ('Cape Verde', ('Russian Federation', 45)),
 ('Chad', ('Czechia', 68)),
 ('Chile', ('Liechtenstein', 54)),
 ('Chile (Antarctic Stations)', ('Czechia', 54)),
 ('China', ('Czechia', 67)),
 ('Colombia', ('Czechia', 40)),
 ('Colombia (San Andres And Providencia Islands)', ('Czechia', 51)),
 ('Costa Rica', ('Russian Federation', 43)),
 ('Cuba', ('Czechia', 45)),
 ('Czech Republic', ('Czechia', 75)),
 ('Djibouti', ('United Kingdom', 43)),
 ('Dominica', ('United Kingdom', 53)),
 ('Dominican Republic', ('Russian Federation', 37)),
 ('Ecuador', ('Czechia', 43)),
 ('Egypt', ('Liechtenstein', 36)),
 ('El Salvador', ('United States', 33)),
 ('Eritrea', ('United Kingdom', 47)),
 ('Fiji', ('Russian Federation', 34)),
 ('Footnoteseqid', ('United States', 44)),
 ('France (Caribbean Islands, Guadeloupe, Martinique)', ('Czechia', 51)),
 ('France (French Department Of Guyana)', ('Russian Federation', 47)),
 ('France (Islands In The Indian Ocean)', ('United Kingdom', 51)),
 ('French Polynesia', ('Czechia', 49)),
 ('Gabon', ('Russian Federation', 54)),
 ('Georgia', ('Czechia', 43)),
 ('Greenland', ('Russian Federation', 40)),
 ('Guinea', ('United Kingdom', 49)),
 ('Guyana', ('United Kingdom', 45)),
 ('Honduras', ('Russian Federation', 45)),
 ('Hong Kong', ('United Kingdom', 43)),
 ('Iceland', ('Liechtenstein', 51)),
 ('India', ('United Kingdom', 54)),
 ('Iran, Islamic Republic Of', ('Russian Federation', 36)),
 ('Israel', ('United States', 47)),
 ('Ivory Coast', ('Russian Federation', 39)),
 ('Jordan', ('Russian Federation', 45)),
 ('Kazakstan (Asia)', ('Czechia', 40)),
 ('Kazakstan (Europe)', ('Russian Federation', 40)),
 ('Kenya', ('Liechtenstein', 45)),
 ("Korea, Democratic People'S Republic Of", ('Russian Federation', 48)),
 ('Korea, Republic Of', ('Russian Federation', 38)),
 ('Kuwait', ('United Kingdom', 54)),
 ('Kyrgyz Republic', ('Türkiye', 40)),
 ("Lao People'S Democratic Republic", ('Russian Federation', 44)),
 ('Lebanon', ('Russian Federation', 51)),
 ('Madagascar', ('Russian Federation', 27)),
 ('Malawi', ('Liechtenstein', 45)),
 ('Malaysia', ('Russian Federation', 43)),
 ('Maldives', ('Liechtenstein', 49)),
 ('Mali', ('Liechtenstein', 60)),
 ('Mauritius', ('Russian Federation', 48)),
 ('Mexico', ('Russian Federation', 45)),
 ('Moldova, Republic Of', ('Russian Federation', 36)),
 ('Mongolia', ('United Kingdom', 42)),
 ('Morocco', ('Czechia', 29)),
 ('Myanmar', ('Russian Federation', 38)),
 ('Netherlands Antilles And Aruba', ('United States', 48)),
 ('New Caledonia', ('Russian Federation', 43)),
 ('Nicaragua', ('Russian Federation', 38)),
 ('Niger', ('United Kingdom', 54)),
 ('Nigeria', ('United Kingdom', 45)),
 ('Oman', ('United Kingdom', 60)),
 ('Pakistan', ('United Kingdom', 47)),
 ('Paraguay', ('Russian Federation', 39)),
 ('Peru', ('Russian Federation', 60)),
 ('Philippines', ('Czechia', 36)),
 ('Portugal (Madeira)', ('Czechia', 49)),
 ('Qatar', ('United States', 43)),
 ('Russian Federation (Asia)', ('Russian Federation', 95)),
 ('Russian Federation (Europe)', ('Russian Federation', 95)),
 ('Rwanda', ('Russian Federation', 45)),
 ('Saudi Arabia', ('Russian Federation', 43)),
 ('Senegal', ('Liechtenstein', 45)),
 ('Seychelles', ('Liechtenstein', 43)),
 ('Sierra Leone', ('Russian Federation', 47)),
 ('Singapore', ('United Kingdom', 48)),
 ('Solomon Islands', ('United Kingdom', 39)),
 ('South Africa', ('Czechia', 40)),
 ('Spain (Canary Islands, Ceuta And Melilla)', ('Czechia', 42)),
 ('Sri Lanka', ('Russian Federation', 45)),
 ('Sudan', ('Russian Federation', 57)),
 ('Syrian Arab Republic', ('Russian Federation', 45)),
 ('Tajikistan', ('Russian Federation', 39)),
 ('Tanzania, United Republic Of', ('United Kingdom', 86)),
 ('Thailand', ('Russian Federation', 47)),
 ('The Former Yugoslav Republic Of Macedonia', ('Czechia', 51)),
 ('Togo', ('United Kingdom', 45)),
 ('Trinidad And Tobago', ('Türkiye', 54)),
 ('Tunisia', ('Russian Federation', 55)),
 ('Turkey', ('Türkiye', 62)),
 ('Turkmenistan', ('Türkiye', 55)),
 ('United Arab Emirates', ('United States', 86)),
 ('United Kingdom Of Great Britain & Northern Ireland',
  ('United Kingdom', 90)),
 ('United States Of America', ('United States', 90)),
 ('United States Of America (Antarctic Stations)', ('United States', 90)),
 ('United States Of America (Pacific Islands)', ('United States', 90)),
 ('Uruguay', ('Russian Federation', 42)),
 ('Uzbekistan', ('Czechia', 47)),
 ('Venezuela', ('Russian Federation', 40)),
 ('Yugoslavia', ('Russian Federation', 45)),
 ('Zambia', ('Czechia', 46)),
 ('Zimbabwe', ('Czechia', 40))]
In [315]:
[(f,fz.extractOne(f, OnlyPrep)) for f in sorted(OnlyCO2_Solar)]
#Notamos que al extraer los matchs de ambas formas se pierden datos si se toman valores >=87 o >=75 o >=62
#En algunos casos se toma datos incorrectos y en otros se obvia datos, por tanto los datos faltantes se modificarán manualmente
Out[315]:
[('Czechia', ('Czech Republic', 75)),
 ('Liechtenstein', ('Benin', 68)),
 ('Russian Federation', ('Russian Federation (Asia)', 95)),
 ('Türkiye', ('Turkey', 62)),
 ('United Kingdom',
  ('United Kingdom Of Great Britain & Northern Ireland', 90)),
 ('United States', ('United States Of America (Pacific Islands)', 90))]
In [316]:
match_CO2_Solar_Prep = [(f,fz.extractOne(f, OnlyCO2_Solar)) for f in sorted(OnlyPrep)
 if fz.extractOne(f, OnlyCO2_Solar)[1]>=87]
match_CO2_Solar_Prep
Out[316]:
[('Russian Federation (Asia)', ('Russian Federation', 95)),
 ('Russian Federation (Europe)', ('Russian Federation', 95)),
 ('United Kingdom Of Great Britain & Northern Ireland',
  ('United Kingdom', 90)),
 ('United States Of America', ('United States', 90)),
 ('United States Of America (Antarctic Stations)', ('United States', 90)),
 ('United States Of America (Pacific Islands)', ('United States', 90))]
In [317]:
match_CO2_Solar_Prep= {f:fz.extractOne(f, OnlyCO2_Solar)[0] 
                 for f in sorted(OnlyPrep)
                 if fz.extractOne(f, OnlyCO2_Solar)[1] >=87}
match_CO2_Solar_Prep
Out[317]:
{'Russian Federation (Asia)': 'Russian Federation',
 'Russian Federation (Europe)': 'Russian Federation',
 'United Kingdom Of Great Britain & Northern Ireland': 'United Kingdom',
 'United States Of America': 'United States',
 'United States Of America (Antarctic Stations)': 'United States',
 'United States Of America (Pacific Islands)': 'United States'}
In [318]:
precipitation_new.Country.replace(to_replace=match_CO2_Solar_Prep,inplace=True)
In [319]:
precipitation_new.Country=precipitation_new.Country.str.replace('Czech Republic', 'Czechia')
precipitation_new.loc[precipitation_new['Country']=='Czechia']
Out[319]:
Country Station Name Period Statistic Description Unit Jan Jan Footnotes Feb Feb Footnotes Mar ... Oct Oct Footnotes Nov Nov Footnotes Dec Dec Footnotes Annual Annual Footnotes Annual NCDC Computed Value Annual NCDC Computed Value Footnotes
888 Czechia Cheb 1961-1990 Mean Monthly Value mm 36.1 NaN 29.5 NaN 36.3 ... 37.5 NaN 41.1 NaN 43.9 NaN 559.8 NaN 559.7 NaN
889 Czechia Praha 6, Ruzyne 1961-1990 Mean Monthly Value mm 23.6 NaN 22.6 NaN 28.1 ... 30.5 NaN 31.9 NaN 25.3 NaN 526.2 NaN 526.3 NaN
890 Czechia Brno 4, Turany 1961-1990 Mean Monthly Value mm 24.5 NaN 23.7 NaN 24.2 ... 30.5 NaN 37.5 NaN 27.1 NaN 490.1 NaN 490.0 NaN
891 Czechia Mosnov, Ostrava 1961-1990 Mean Monthly Value mm 26.7 NaN 30.2 NaN 34.0 ... 42.3 NaN 44.6 NaN 34.3 NaN 701.8 NaN 701.8 NaN

4 rows × 33 columns

In [320]:
precipitation_new.Country=precipitation_new.Country.str.replace('Turkey', 'Türkiye')
precipitation_new.loc[precipitation_new['Country']=='Türkiye']
Out[320]:
Country Station Name Period Statistic Description Unit Jan Jan Footnotes Feb Feb Footnotes Mar ... Oct Oct Footnotes Nov Nov Footnotes Dec Dec Footnotes Annual Annual Footnotes Annual NCDC Computed Value Annual NCDC Computed Value Footnotes
3088 Türkiye ZONGULDAK 1961-1990 Mean Monthly Value mm 136.9 NaN 91.6 NaN 89.8 ... 139.1 NaN 142.3 NaN 159.3 NaN -9999.9 1.0 1194.1 NaN
3089 Türkiye SAMSUN 1961-1990 Mean Monthly Value mm 61.4 NaN 50.0 NaN 55.8 ... 84.7 NaN 89.3 NaN 82.2 NaN -9999.9 1.0 691.5 NaN
3090 Türkiye GIRESUN 1961-1990 Mean Monthly Value mm 121.6 NaN 89.2 NaN 88.9 ... 159.6 NaN 142.7 NaN 131.3 NaN -9999.9 1.0 1236.1 NaN
3091 Türkiye RIZE 1961-1990 Mean Monthly Value mm 216.5 NaN 172.7 NaN 147.1 ... 272.2 NaN 249.5 NaN 242.5 NaN -9999.9 1.0 2171.0 NaN
3092 Türkiye EDIRNE 1961-1990 Mean Monthly Value mm 61.4 NaN 53.3 NaN 55.0 ... 44.8 NaN 64.0 NaN 70.6 NaN -9999.9 1.0 577.6 NaN
3093 Türkiye GOZTEPE 1961-1990 Mean Monthly Value mm 98.7 NaN 66.9 NaN 62.1 ... 70.7 NaN 88.9 NaN 121.7 NaN -9999.9 1.0 696.7 NaN
3094 Türkiye KASTAMONU 1961-1990 Mean Monthly Value mm 33.2 NaN 27.6 NaN 32.9 ... 39.3 NaN 29.3 NaN 37.6 NaN -9999.9 1.0 483.6 NaN
3095 Türkiye CORUM 1961-1990 Mean Monthly Value mm 41.9 NaN 31.3 NaN 36.0 ... 30.4 NaN 36.5 NaN 50.7 NaN -9999.9 1.0 449.2 NaN
3096 Türkiye SIVAS 1961-1990 Mean Monthly Value mm 41.4 NaN 36.6 NaN 46.1 ... 32.1 NaN 38.9 NaN 47.2 NaN -9999.9 1.0 427.4 NaN
3097 Türkiye ERZINCAN 1961-1990 Mean Monthly Value mm 29.4 NaN 29.2 NaN 39.1 ... 41.9 NaN 39.2 NaN 31.3 NaN -9999.9 1.0 381.2 NaN
3098 Türkiye ERZURUM 1961-1990 Mean Monthly Value mm 21.3 NaN 25.3 NaN 29.2 ... 45.8 NaN 36.9 NaN 21.3 NaN -9999.9 1.0 414.1 NaN
3099 Türkiye KARS 1961-1990 Mean Monthly Value mm 18.8 NaN 22.4 NaN 26.0 ... 38.5 NaN 25.7 NaN 20.8 NaN -9999.9 1.0 462.9 NaN
3100 Türkiye CANAKKALE 1961-1990 Mean Monthly Value mm 97.4 NaN 66.2 NaN 61.1 ... 47.8 NaN 88.8 NaN 117.5 NaN -9999.9 1.0 634.1 NaN
3101 Türkiye BURSA 1961-1990 Mean Monthly Value mm 89.6 NaN 85.8 NaN 64.0 ... 58.0 NaN 80.6 NaN 118.4 NaN -9999.9 1.0 706.1 NaN
3102 Türkiye ESKISEHIR 1961-1990 Mean Monthly Value mm 45.2 NaN 33.9 NaN 40.5 ... 28.8 NaN 31.0 NaN 50.8 NaN -9999.9 1.0 387.8 NaN
3103 Türkiye ANKARA 1961-1990 Mean Monthly Value mm 47.0 NaN 36.3 NaN 36.3 ... 26.8 NaN 33.4 NaN 49.0 NaN -9999.9 1.0 414.6 NaN
3104 Türkiye BALIKESIR 1961-1990 Mean Monthly Value mm 87.8 NaN 63.6 NaN 55.6 ... 38.0 NaN 78.8 NaN 107.8 NaN -9999.9 1.0 587.2 NaN
3105 Türkiye VAN 1961-1990 Mean Monthly Value mm 34.8 NaN 33.9 NaN 42.3 ... 48.3 NaN 42.3 NaN 33.5 NaN -9999.9 1.0 385.1 NaN
3106 Türkiye USAK 1961-1990 Mean Monthly Value mm 75.2 NaN 65.5 NaN 57.3 ... 36.1 NaN 58.5 NaN 86.2 NaN -9999.9 1.0 541.1 NaN
3107 Türkiye AFYON 1961-1990 Mean Monthly Value mm 37.3 NaN 34.9 NaN 39.8 ... 37.2 NaN 30.8 NaN 46.2 NaN -9999.9 1.0 398.6 NaN
3108 Türkiye KAYSERI 1961-1990 Mean Monthly Value mm 32.7 NaN 31.2 NaN 40.2 ... 28.2 NaN 36.0 NaN 39.3 NaN -9999.9 1.0 386.7 NaN
3109 Türkiye MALATYA 1961-1990 Mean Monthly Value mm 41.9 NaN 35.6 NaN 59.5 ... 40.4 NaN 47.0 NaN 42.3 NaN -9999.9 1.0 410.9 NaN
3110 Türkiye SIIRT 1961-1990 Mean Monthly Value mm 89.5 NaN 96.7 NaN 102.0 ... 52.7 NaN 81.5 NaN 96.9 NaN -9999.9 1.0 710.4 NaN
3111 Türkiye IZMIR 1961-1990 Mean Monthly Value mm 132.3 NaN 99.1 NaN 76.4 ... 40.3 NaN 87.1 NaN 153.0 NaN -9999.9 1.0 692.0 NaN
3112 Türkiye ISPARTA 1961-1990 Mean Monthly Value mm 78.7 NaN 74.3 NaN 54.6 ... 36.4 NaN 46.3 NaN 90.1 NaN -9999.9 1.0 546.2 NaN
3113 Türkiye KONYA 1961-1990 Mean Monthly Value mm 38.8 NaN 32.0 NaN 28.8 ... 31.7 NaN 32.5 NaN 44.1 NaN -9999.9 1.0 338.2 NaN
3114 Türkiye NIGDE 1961-1990 Mean Monthly Value mm 30.5 NaN 31.3 NaN 33.1 ... 23.7 NaN 30.0 NaN 40.1 NaN -9999.9 1.0 323.6 NaN
3115 Türkiye KAHRAMANMARAS 1961-1990 Mean Monthly Value mm 125.1 NaN 110.9 NaN 96.6 ... 48.7 NaN 77.0 NaN 132.0 NaN -9999.9 1.0 718.6 NaN
3116 Türkiye URFA 1961-1990 Mean Monthly Value mm 84.6 NaN 73.5 NaN 71.1 ... 32.5 NaN 42.6 NaN 87.0 NaN -9999.9 1.0 482.1 NaN
3117 Türkiye DIYARBAKIR 1961-1990 Mean Monthly Value mm 70.3 NaN 69.7 NaN 70.9 ... 34.0 NaN 52.2 NaN 75.9 NaN -9999.9 1.0 498.4 NaN
3118 Türkiye HAKKARI 1961-1990 Mean Monthly Value mm 89.3 NaN 105.2 NaN 131.1 ... 70.4 NaN 96.7 NaN 93.4 NaN -9999.9 1.0 818.4 NaN
3119 Türkiye MUGLA 1961-1990 Mean Monthly Value mm 243.9 NaN 191.9 NaN 121.3 ... 71.9 NaN 135.3 NaN 281.0 NaN -9999.9 1.0 1222.1 NaN
3120 Türkiye ANTALYA 1961-1990 Mean Monthly Value mm 237.9 NaN 191.0 NaN 101.5 ... 70.4 NaN 149.7 NaN 222.7 NaN -9999.9 1.0 1077.5 NaN
3121 Türkiye MERSIN 1961-1990 Mean Monthly Value mm 120.5 NaN 110.4 NaN 57.9 ... 40.2 NaN 75.7 NaN 135.6 NaN -9999.9 1.0 633.8 NaN
3122 Türkiye ADANA 1961-1990 Mean Monthly Value mm 117.7 NaN 91.8 NaN 68.1 ... 46.6 NaN 74.6 NaN 138.3 NaN -9999.9 1.0 698.1 NaN

35 rows × 33 columns

Ahora, sí se procede a realizar el merge entre el datafram anterior y el de precipitacion:

In [321]:
co2_solar_prep = co2_solar.merge(precipitation_new,how='inner',left_on='Country',right_on='Country')
co2_solar_prep
Out[321]:
Country Year_x Value Year_y Unit_x Quantity Quantity Footnotes Station Name Period Statistic Description ... Oct Oct Footnotes Nov Nov Footnotes Dec Dec Footnotes Annual Annual Footnotes Annual NCDC Computed Value Annual NCDC Computed Value Footnotes
0 Australia 1990 2.781542e+05 2020 Kilowatt-hours, million 21033.194 NaN ALBANY (ALBANY TOWN) 1961-1965 Mean Monthly Value ... 96.0 NaN 37.9 NaN 23.0 NaN -9999.9 1.0 1012.3 NaN
1 Australia 1990 2.781542e+05 2020 Kilowatt-hours, million 21033.194 NaN ESPERANCE (ESPERANCE POST OFFICE) 1961-1969 Mean Monthly Value ... 51.3 NaN 19.3 NaN 17.9 NaN -9999.9 1.0 698.8 NaN
2 Australia 1990 2.781542e+05 2020 Kilowatt-hours, million 21033.194 NaN TENNANT CRK POST OFFICE WAS 014050 1961-1970 Mean Monthly Value ... 14.3 NaN 11.8 NaN 40.6 NaN -9999.9 1.0 321.3 NaN
3 Australia 1990 2.781542e+05 2020 Kilowatt-hours, million 21033.194 NaN LONGREACH POST OFFICE 1961-1973 Mean Monthly Value ... 19.6 NaN 22.3 NaN 84.1 NaN -9999.9 1.0 433.7 NaN
4 Australia 1990 2.781542e+05 2020 Kilowatt-hours, million 21033.194 NaN ESSENDON AIRPORT AMO 1961-1972 Mean Monthly Value ... 51.5 NaN 42.7 NaN 57.5 NaN -9999.9 1.0 594.9 NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2570 United States 1990 5.122496e+06 2020 Kilowatt-hours, million 119328.543 NaN POHNPEI, PI 1961-1990 Mean Monthly Value ... 424.4 NaN 399.8 NaN 386.6 NaN 4769.1 NaN 4769.1 NaN
2571 United States 1990 5.122496e+06 2020 Kilowatt-hours, million 119328.543 NaN KWAJALEIN ISLAND, PI 1961-1990 Mean Monthly Value ... 302.5 NaN 270.8 NaN 205.7 NaN 2593.0 NaN 2593.0 NaN
2572 United States 1990 5.122496e+06 2020 Kilowatt-hours, million 119328.543 NaN MAJURO, PI 1961-1990 Mean Monthly Value ... 351.5 NaN 325.1 NaN 301.0 NaN 3336.0 NaN 3336.0 NaN
2573 United States 1990 5.122496e+06 2020 Kilowatt-hours, million 119328.543 NaN KOROR/W CAROLINE ISLANDS PI 1961-1990 Mean Monthly Value ... 352.3 NaN 287.5 NaN 304.3 NaN 3758.3 NaN 3758.3 NaN
2574 United States 1990 5.122496e+06 2020 Kilowatt-hours, million 119328.543 NaN YAP ISLAND/CAROLINE ISLANDS PI 1961-1990 Mean Monthly Value ... 304.0 NaN 230.4 NaN 228.3 NaN 3049.5 NaN 3049.5 NaN

2575 rows × 39 columns

A todo este resultado le asigno un nuevo nombre y es el que será modificado.

In [322]:
pd.options.mode.chained_assignment = None
In [323]:
DF = co2_solar_prep.copy()
DF.head()
Out[323]:
Country Year_x Value Year_y Unit_x Quantity Quantity Footnotes Station Name Period Statistic Description ... Oct Oct Footnotes Nov Nov Footnotes Dec Dec Footnotes Annual Annual Footnotes Annual NCDC Computed Value Annual NCDC Computed Value Footnotes
0 Australia 1990 278154.156296 2020 Kilowatt-hours, million 21033.194 NaN ALBANY (ALBANY TOWN) 1961-1965 Mean Monthly Value ... 96.0 NaN 37.9 NaN 23.0 NaN -9999.9 1.0 1012.3 NaN
1 Australia 1990 278154.156296 2020 Kilowatt-hours, million 21033.194 NaN ESPERANCE (ESPERANCE POST OFFICE) 1961-1969 Mean Monthly Value ... 51.3 NaN 19.3 NaN 17.9 NaN -9999.9 1.0 698.8 NaN
2 Australia 1990 278154.156296 2020 Kilowatt-hours, million 21033.194 NaN TENNANT CRK POST OFFICE WAS 014050 1961-1970 Mean Monthly Value ... 14.3 NaN 11.8 NaN 40.6 NaN -9999.9 1.0 321.3 NaN
3 Australia 1990 278154.156296 2020 Kilowatt-hours, million 21033.194 NaN LONGREACH POST OFFICE 1961-1973 Mean Monthly Value ... 19.6 NaN 22.3 NaN 84.1 NaN -9999.9 1.0 433.7 NaN
4 Australia 1990 278154.156296 2020 Kilowatt-hours, million 21033.194 NaN ESSENDON AIRPORT AMO 1961-1972 Mean Monthly Value ... 51.5 NaN 42.7 NaN 57.5 NaN -9999.9 1.0 594.9 NaN

5 rows × 39 columns

In [324]:
co2_solar.shape
Out[324]:
(40, 7)

Notamos que con el último merge se perdió solo un dato

In [325]:
co2_solar_prep.groupby('Country')[['Annual NCDC Computed Value']].agg('mean').shape
Out[325]:
(39, 1)

2. Merge con el archivo adjunto¶

El archivo adjuntado en Paideia PUCP fue subido al repositorio de Github

In [326]:
country_isosLink = 'https://github.com/IntroSpatialDataScience/ClimateChangeData/raw/main/data/country_isos.xlsx'
country_Isos = pd.read_excel(country_isosLink)
In [327]:
country_Isos
Out[327]:
Country Officialstatename InternetccTLD iso2 iso3
0 AFGHANISTAN The Islamic Republic of Afghanistan .af AF AFG
1 ALBANIA The Republic of Albania .al AL ALB
2 ALGERIA The People's Democratic Republic of Algeria .dz DZ DZA
3 ANGOLA The Republic of Angola .ao AO AGO
4 ANTIGUA AND BARBUDA Antigua and Barbuda .ag AG ATG
... ... ... ... ... ...
166 UZBEKISTAN The Republic of Uzbekistan .uz UZ UZB
167 VIET NAM The Socialist Republic of Viet Nam .vn VN VNM
168 YEMEN The Republic of Yemen .ye YE YEM
169 ZAMBIA The Republic of Zambia .zm ZM ZMB
170 ZIMBABWE The Republic of Zimbabwe .zw ZW ZWE

171 rows × 5 columns

In [328]:
country_Isos.Country=country_Isos.Country.str.title()
In [329]:
OnlyDF=set(DF.Country)-set(country_Isos.Country)
OnlyDF
Out[329]:
{'Netherlands',
 'Russian Federation',
 'Türkiye',
 'United Kingdom',
 'United States'}
In [330]:
OnlyIsos=set(country_Isos.Country)-set(DF.Country)
OnlyIsos
Out[330]:
{'Afghanistan',
 'Albania',
 'Algeria',
 'Angola',
 'Antigua And Barbuda',
 'Argentina',
 'Armenia',
 'Azerbaijan',
 'Bahamas (The)',
 'Bahrain',
 'Bangladesh',
 'Barbados',
 'Belize',
 'Benin',
 'Bhutan',
 'Bolivia (Plurinational State Of)',
 'Bosnia And Herzegovina',
 'Botswana',
 'Brazil',
 'Brunei Darussalam',
 'Burkina Faso',
 'Burundi',
 'Cabo Verde',
 'Cambodia',
 'Cameroon',
 'Central African Republic (The)',
 'Chad',
 'Chile',
 'China',
 'Colombia',
 'Comoros (The)',
 'Congo (The Democratic Republic Of The)',
 'Congo (The)',
 'Costa Rica',
 'Cuba',
 "C√Îte D'Ivoire",
 'Djibouti',
 'Dominican Republic (The)',
 'Ecuador',
 'Egypt',
 'El Salvador',
 'Equatorial Guinea',
 'Eswatini',
 'Ethiopia',
 'Fiji',
 'Gabon',
 'Gambia (The)',
 'Georgia',
 'Ghana',
 'Grenada',
 'Guatemala',
 'Guinea',
 'Guinea-Bissau',
 'Guyana',
 'Haiti',
 'Honduras',
 'Iceland',
 'India',
 'Indonesia',
 'Iran (Islamic Republic Of)',
 'Iraq',
 'Jamaica',
 'Jordan',
 'Kazakhstan',
 'Kenya',
 'Korea (The Republic Of)',
 'Kuwait',
 'Kyrgyzstan',
 "Lao People'S Democratic Republic (The)",
 'Lebanon',
 'Lesotho',
 'Liberia',
 'Libya',
 'Madagascar',
 'Malawi',
 'Malaysia',
 'Maldives',
 'Mali',
 'Mauritania',
 'Mauritius',
 'Mexico',
 'Micronesia (Federated States Of)',
 'Moldova (The Republic Of)',
 'Mongolia',
 'Montenegro',
 'Morocco',
 'Mozambique',
 'Myanmar',
 'Namibia',
 'Nepal',
 'Netherlands (The)',
 'Nicaragua',
 'Niger (The)',
 'Nigeria',
 'North Macedonia',
 'Oman',
 'Pakistan',
 'Panama',
 'Papua New Guinea',
 'Paraguay',
 'Peru',
 'Philippines (The)',
 'Qatar',
 'Russian Federation (The)',
 'Rwanda',
 'Samoa',
 'Sao Tome And Principe',
 'Saudi Arabia',
 'Senegal',
 'Serbia',
 'Seychelles',
 'Sierra Leone',
 'Singapore',
 'Solomon Islands',
 'South Sudan',
 'Sri Lanka',
 'Sudan (The)',
 'Suriname',
 'Tajikistan',
 'Tanzania, The United Republic Of',
 'Thailand',
 'Timor-Leste',
 'Togo',
 'Trinidad And Tobago',
 'Tunisia',
 'Turkey',
 'Turkmenistan',
 'Uganda',
 'United Arab Emirates (The)',
 'United Kingdom Of Great Britain And Northern Ireland (The)',
 'United States Minor Outlying Islands (The)',
 'Uruguay',
 'Uzbekistan',
 'Viet Nam',
 'Yemen',
 'Zambia',
 'Zimbabwe'}
In [331]:
[(f,fz.extractOne(f, OnlyIsos)) for f in sorted(OnlyDF)]
Out[331]:
[('Netherlands', ('Netherlands (The)', 95)),
 ('Russian Federation', ('Russian Federation (The)', 95)),
 ('Türkiye', ('Turkey', 62)),
 ('United Kingdom',
  ('United Kingdom Of Great Britain And Northern Ireland (The)', 90)),
 ('United States', ('United States Minor Outlying Islands (The)', 90))]
In [332]:
match_DF_Isos = [(f,fz.extractOne(f, OnlyIsos)) for f in sorted(OnlyDF)
 if fz.extractOne(f, OnlyIsos)[1]>=60]
match_DF_Isos= {f:fz.extractOne(f, OnlyIsos)[0] 
                 for f in sorted(OnlyDF)
                 if fz.extractOne(f, OnlyIsos)[1] >=60}
match_DF_Isos
Out[332]:
{'Netherlands': 'Netherlands (The)',
 'Russian Federation': 'Russian Federation (The)',
 'Türkiye': 'Turkey',
 'United Kingdom': 'United Kingdom Of Great Britain And Northern Ireland (The)',
 'United States': 'United States Minor Outlying Islands (The)'}
In [333]:
#Invierto el dict para reemplazar los valores en el dataframe countryIsos
match_DF_Isos = dict(zip(match_DF_Isos.values(), match_DF_Isos.keys()))
match_DF_Isos
Out[333]:
{'Netherlands (The)': 'Netherlands',
 'Russian Federation (The)': 'Russian Federation',
 'Turkey': 'Türkiye',
 'United Kingdom Of Great Britain And Northern Ireland (The)': 'United Kingdom',
 'United States Minor Outlying Islands (The)': 'United States'}

Realizo el reemplazo de los nuevos nombres

In [334]:
country_Isos.Country.replace(to_replace=match_DF_Isos,inplace=True)
country_Isos
Out[334]:
Country Officialstatename InternetccTLD iso2 iso3
0 Afghanistan The Islamic Republic of Afghanistan .af AF AFG
1 Albania The Republic of Albania .al AL ALB
2 Algeria The People's Democratic Republic of Algeria .dz DZ DZA
3 Angola The Republic of Angola .ao AO AGO
4 Antigua And Barbuda Antigua and Barbuda .ag AG ATG
... ... ... ... ... ...
166 Uzbekistan The Republic of Uzbekistan .uz UZ UZB
167 Viet Nam The Socialist Republic of Viet Nam .vn VN VNM
168 Yemen The Republic of Yemen .ye YE YEM
169 Zambia The Republic of Zambia .zm ZM ZMB
170 Zimbabwe The Republic of Zimbabwe .zw ZW ZWE

171 rows × 5 columns

Realizo el merge entre mi Dataframe y el nuevo DF manteniendo lo común entre ambos.

In [335]:
DF = DF.merge(country_Isos,how='inner',left_on='Country',right_on='Country')
DF
Out[335]:
Country Year_x Value Year_y Unit_x Quantity Quantity Footnotes Station Name Period Statistic Description ... Dec Dec Footnotes Annual Annual Footnotes Annual NCDC Computed Value Annual NCDC Computed Value Footnotes Officialstatename InternetccTLD iso2 iso3
0 Australia 1990 2.781542e+05 2020 Kilowatt-hours, million 21033.194 NaN ALBANY (ALBANY TOWN) 1961-1965 Mean Monthly Value ... 23.0 NaN -9999.9 1.0 1012.3 NaN The Commonwealth of Australia .au AU AUS
1 Australia 1990 2.781542e+05 2020 Kilowatt-hours, million 21033.194 NaN ESPERANCE (ESPERANCE POST OFFICE) 1961-1969 Mean Monthly Value ... 17.9 NaN -9999.9 1.0 698.8 NaN The Commonwealth of Australia .au AU AUS
2 Australia 1990 2.781542e+05 2020 Kilowatt-hours, million 21033.194 NaN TENNANT CRK POST OFFICE WAS 014050 1961-1970 Mean Monthly Value ... 40.6 NaN -9999.9 1.0 321.3 NaN The Commonwealth of Australia .au AU AUS
3 Australia 1990 2.781542e+05 2020 Kilowatt-hours, million 21033.194 NaN LONGREACH POST OFFICE 1961-1973 Mean Monthly Value ... 84.1 NaN -9999.9 1.0 433.7 NaN The Commonwealth of Australia .au AU AUS
4 Australia 1990 2.781542e+05 2020 Kilowatt-hours, million 21033.194 NaN ESSENDON AIRPORT AMO 1961-1972 Mean Monthly Value ... 57.5 NaN -9999.9 1.0 594.9 NaN The Commonwealth of Australia .au AU AUS
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2570 United States 1990 5.122496e+06 2020 Kilowatt-hours, million 119328.543 NaN POHNPEI, PI 1961-1990 Mean Monthly Value ... 386.6 NaN 4769.1 NaN 4769.1 NaN Baker Island, Howland Island, Jarvis Island... NaN UM UMI
2571 United States 1990 5.122496e+06 2020 Kilowatt-hours, million 119328.543 NaN KWAJALEIN ISLAND, PI 1961-1990 Mean Monthly Value ... 205.7 NaN 2593.0 NaN 2593.0 NaN Baker Island, Howland Island, Jarvis Island... NaN UM UMI
2572 United States 1990 5.122496e+06 2020 Kilowatt-hours, million 119328.543 NaN MAJURO, PI 1961-1990 Mean Monthly Value ... 301.0 NaN 3336.0 NaN 3336.0 NaN Baker Island, Howland Island, Jarvis Island... NaN UM UMI
2573 United States 1990 5.122496e+06 2020 Kilowatt-hours, million 119328.543 NaN KOROR/W CAROLINE ISLANDS PI 1961-1990 Mean Monthly Value ... 304.3 NaN 3758.3 NaN 3758.3 NaN Baker Island, Howland Island, Jarvis Island... NaN UM UMI
2574 United States 1990 5.122496e+06 2020 Kilowatt-hours, million 119328.543 NaN YAP ISLAND/CAROLINE ISLANDS PI 1961-1990 Mean Monthly Value ... 228.3 NaN 3049.5 NaN 3049.5 NaN Baker Island, Howland Island, Jarvis Island... NaN UM UMI

2575 rows × 43 columns

In [336]:
co2_solar_prep.groupby('Country')[['Annual NCDC Computed Value']].agg('mean').shape
Out[336]:
(39, 1)

Vemos que se ha mantenido el número de datos

In [337]:
DF.groupby('Country')[['Annual NCDC Computed Value']].agg('mean').shape
Out[337]:
(39, 1)

3. Pre-procesamiento del nuevo DataFrame¶

Mantengo solo las columnas de mi interés

Cleaning my DF¶

In [338]:
DF = DF[['Country','Value','Quantity','Station Name','Annual NCDC Computed Value','Officialstatename','InternetccTLD','iso2','iso3']]
DF
Out[338]:
Country Value Quantity Station Name Annual NCDC Computed Value Officialstatename InternetccTLD iso2 iso3
0 Australia 2.781542e+05 21033.194 ALBANY (ALBANY TOWN) 1012.3 The Commonwealth of Australia .au AU AUS
1 Australia 2.781542e+05 21033.194 ESPERANCE (ESPERANCE POST OFFICE) 698.8 The Commonwealth of Australia .au AU AUS
2 Australia 2.781542e+05 21033.194 TENNANT CRK POST OFFICE WAS 014050 321.3 The Commonwealth of Australia .au AU AUS
3 Australia 2.781542e+05 21033.194 LONGREACH POST OFFICE 433.7 The Commonwealth of Australia .au AU AUS
4 Australia 2.781542e+05 21033.194 ESSENDON AIRPORT AMO 594.9 The Commonwealth of Australia .au AU AUS
... ... ... ... ... ... ... ... ... ...
2570 United States 5.122496e+06 119328.543 POHNPEI, PI 4769.1 Baker Island, Howland Island, Jarvis Island... NaN UM UMI
2571 United States 5.122496e+06 119328.543 KWAJALEIN ISLAND, PI 2593.0 Baker Island, Howland Island, Jarvis Island... NaN UM UMI
2572 United States 5.122496e+06 119328.543 MAJURO, PI 3336.0 Baker Island, Howland Island, Jarvis Island... NaN UM UMI
2573 United States 5.122496e+06 119328.543 KOROR/W CAROLINE ISLANDS PI 3758.3 Baker Island, Howland Island, Jarvis Island... NaN UM UMI
2574 United States 5.122496e+06 119328.543 YAP ISLAND/CAROLINE ISLANDS PI 3049.5 Baker Island, Howland Island, Jarvis Island... NaN UM UMI

2575 rows × 9 columns

In [339]:
pd.options.mode.chained_assignment = None #desactivar mensaje de emergencia
In [340]:
DF.rename(columns={'Value':'CO2_anual','Quantity':'solar_energy_anual','Annual NCDC Computed Value':'precipitation_anual'}, inplace=True)
In [341]:
DF
Out[341]:
Country CO2_anual solar_energy_anual Station Name precipitation_anual Officialstatename InternetccTLD iso2 iso3
0 Australia 2.781542e+05 21033.194 ALBANY (ALBANY TOWN) 1012.3 The Commonwealth of Australia .au AU AUS
1 Australia 2.781542e+05 21033.194 ESPERANCE (ESPERANCE POST OFFICE) 698.8 The Commonwealth of Australia .au AU AUS
2 Australia 2.781542e+05 21033.194 TENNANT CRK POST OFFICE WAS 014050 321.3 The Commonwealth of Australia .au AU AUS
3 Australia 2.781542e+05 21033.194 LONGREACH POST OFFICE 433.7 The Commonwealth of Australia .au AU AUS
4 Australia 2.781542e+05 21033.194 ESSENDON AIRPORT AMO 594.9 The Commonwealth of Australia .au AU AUS
... ... ... ... ... ... ... ... ... ...
2570 United States 5.122496e+06 119328.543 POHNPEI, PI 4769.1 Baker Island, Howland Island, Jarvis Island... NaN UM UMI
2571 United States 5.122496e+06 119328.543 KWAJALEIN ISLAND, PI 2593.0 Baker Island, Howland Island, Jarvis Island... NaN UM UMI
2572 United States 5.122496e+06 119328.543 MAJURO, PI 3336.0 Baker Island, Howland Island, Jarvis Island... NaN UM UMI
2573 United States 5.122496e+06 119328.543 KOROR/W CAROLINE ISLANDS PI 3758.3 Baker Island, Howland Island, Jarvis Island... NaN UM UMI
2574 United States 5.122496e+06 119328.543 YAP ISLAND/CAROLINE ISLANDS PI 3049.5 Baker Island, Howland Island, Jarvis Island... NaN UM UMI

2575 rows × 9 columns

3.1 Pre-procesamiento: datos de precipitación¶

Transformation: Aggregation¶

Creo un nuevo DF con el promedio de precipitacion anual en todo el país

In [342]:
DF.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2575 entries, 0 to 2574
Data columns (total 9 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Country              2575 non-null   object 
 1   CO2_anual            2575 non-null   float64
 2   solar_energy_anual   2575 non-null   float64
 3   Station Name         2575 non-null   object 
 4   precipitation_anual  2575 non-null   float64
 5   Officialstatename    2575 non-null   object 
 6   InternetccTLD        1796 non-null   object 
 7   iso2                 2575 non-null   object 
 8   iso3                 2575 non-null   object 
dtypes: float64(3), object(6)
memory usage: 181.2+ KB
In [343]:
DF.shape
Out[343]:
(2575, 9)
In [344]:
DF = DF.loc[DF.precipitation_anual > 0]
len(DF)
Out[344]:
2555
In [345]:
max_prep = DF.groupby('Station Name')[['precipitation_anual']].agg(['max'])
max_prep.columns
Out[345]:
MultiIndex([('precipitation_anual', 'max')],
           )
In [346]:
max_prep[('precipitation_anual', 'max')].max()
Out[346]:
8888888.0
In [347]:
DF = DF.loc[DF.precipitation_anual < 10000]
len(DF)
Out[347]:
2554
In [348]:
prep_mean = DF.groupby('Country')[['precipitation_anual']].agg('mean')
prep_mean
Out[348]:
precipitation_anual
Country
Australia 764.802985
Austria 1105.285714
Belarus 622.833333
Belgium 821.200000
Bulgaria 528.714286
Canada 781.171179
Croatia 853.550000
Cyprus 320.000000
Czechia 569.450000
Denmark 727.277778
Estonia 612.333333
Finland 575.916667
France 736.062500
Germany 826.368421
Greece 624.800000
Hungary 560.571429
Ireland 1006.818182
Italy 778.709091
Japan 1731.690323
Latvia 666.000000
Lithuania 682.214286
Luxembourg 875.600000
Malta 553.300000
Netherlands 772.160000
New Zealand 1291.799094
Norway 860.461538
Poland 693.500000
Portugal 907.333333
Romania 583.454545
Russian Federation 497.791379
Slovakia 595.733333
Slovenia 1393.100000
Spain 626.490000
Sweden 589.812500
Switzerland 1628.000000
Türkiye 651.342857
Ukraine 568.000000
United Kingdom 857.700000
United States 940.355913
In [349]:
len(prep_mean)
Out[349]:
39

Cleaning¶

Elimino las siguientes columnas para luego realizar un merge con prep_mean

In [350]:
DF.drop(columns=['Station Name','precipitation_anual'],inplace=True)

Integration¶

Realizo un merge entre mi DF y prep_mean. Mantengo los elemento en común de la columna Country

In [351]:
DF = prep_mean.merge(DF,how='inner',left_on='Country',right_on='Country')
DF
Out[351]:
Country precipitation_anual CO2_anual solar_energy_anual Officialstatename InternetccTLD iso2 iso3
0 Australia 764.802985 2.781542e+05 21033.194 The Commonwealth of Australia .au AU AUS
1 Australia 764.802985 2.781542e+05 21033.194 The Commonwealth of Australia .au AU AUS
2 Australia 764.802985 2.781542e+05 21033.194 The Commonwealth of Australia .au AU AUS
3 Australia 764.802985 2.781542e+05 21033.194 The Commonwealth of Australia .au AU AUS
4 Australia 764.802985 2.781542e+05 21033.194 The Commonwealth of Australia .au AU AUS
... ... ... ... ... ... ... ... ...
2549 United States 940.355913 5.122496e+06 119328.543 Baker Island, Howland Island, Jarvis Island... NaN UM UMI
2550 United States 940.355913 5.122496e+06 119328.543 Baker Island, Howland Island, Jarvis Island... NaN UM UMI
2551 United States 940.355913 5.122496e+06 119328.543 Baker Island, Howland Island, Jarvis Island... NaN UM UMI
2552 United States 940.355913 5.122496e+06 119328.543 Baker Island, Howland Island, Jarvis Island... NaN UM UMI
2553 United States 940.355913 5.122496e+06 119328.543 Baker Island, Howland Island, Jarvis Island... NaN UM UMI

2554 rows × 8 columns

Cleaning¶

In [352]:
DF = DF.drop_duplicates()
DF
Out[352]:
Country precipitation_anual CO2_anual solar_energy_anual Officialstatename InternetccTLD iso2 iso3
0 Australia 764.802985 2.781542e+05 21033.194 The Commonwealth of Australia .au AU AUS
67 Austria 1105.285714 6.214525e+04 2042.934 The Republic of Austria .at AT AUT
74 Belarus 622.833333 1.083451e+05 176.000 The Republic of Belarus .by BY BLR
80 Belgium 821.200000 1.202926e+05 5105.400 The Kingdom of Belgium .be BE BEL
81 Bulgaria 528.714286 7.669920e+04 1480.856 The Republic of Bulgaria .bg BG BGR
88 Canada 781.171179 4.582183e+05 4846.000 Canada .ca CA CAN
317 Croatia 853.550000 2.297979e+04 95.500 The Republic of Croatia .hr HR HRV
319 Cyprus 320.000000 4.653217e+03 295.608 The Republic of Cyprus .cy CY CYP
320 Czechia 569.450000 1.642107e+05 2287.014 The Czech Republic .cz CZ CZE
324 Denmark 727.277778 5.487931e+04 1180.532 The Kingdom of Denmark .dk DK DNK
360 Estonia 612.333333 3.692221e+04 122.540 The Republic of Estonia .ee EE EST
363 Finland 575.916667 5.691434e+04 218.471 The Republic of Finland .fi FI FIN
375 France 736.062500 4.009646e+05 13398.392 The French Republic .fr FR FRA
391 Germany 826.368421 1.051979e+06 48641.000 The Federal Republic of Germany .de DE DEU
410 Greece 624.800000 8.343804e+04 4446.854 The Hellenic Republic .gr GR GRC
432 Hungary 560.571429 7.322554e+04 2459.000 Hungary .hu HU HUN
439 Ireland 1006.818182 3.294442e+04 63.780 Ireland .ie IE IRL
450 Italy 778.709091 4.395498e+05 24941.504 The Italian Republic .it IT ITA
549 Japan 1731.690323 1.158129e+06 79086.858 Japan .jp JP JPN
704 Latvia 666.000000 1.966140e+04 4.847 The Republic of Latvia .lv LV LVA
709 Lithuania 682.214286 3.576773e+04 128.800 The Republic of Lithuania .lt LT LTU
723 Luxembourg 875.600000 1.182335e+04 161.261 The Grand Duchy of Luxembourg .lu LU LUX
724 Malta 553.300000 2.394194e+03 236.837 The Republic of Malta .mt MT MLT
725 Netherlands 772.160000 1.618069e+05 8765.172 The Kingdom of the Netherlands .nl NL NLD
730 New Zealand 1291.799094 2.550251e+04 160.216 New Zealand .nz NZ NZL
1392 Norway 860.461538 3.509663e+04 26.521 The Kingdom of Norway .no NO NOR
1405 Poland 693.500000 3.768136e+05 1957.916 The Republic of Poland .pl PL POL
1423 Portugal 907.333333 4.532509e+04 1715.590 The Portuguese Republic .pt PT PRT
1435 Romania 583.454545 1.734639e+05 1733.375 Romania .ro RO ROU
1446 Russian Federation 497.791379 2.534865e+06 2022.400 The Russian Federation .ru RU RUS
1620 Slovakia 595.733333 6.147019e+04 663.000 The Slovak Republic .sk SK SVK
1623 Slovenia 1393.100000 1.509485e+04 368.196 The Republic of Slovenia .si SI SVN
1624 Spain 626.490000 2.313284e+05 20667.000 The Kingdom of Spain .es ES ESP
1654 Sweden 589.812500 5.758009e+04 1051.000 The Kingdom of Sweden .se SE SWE
1670 Switzerland 1628.000000 4.415988e+04 2598.659 The Swiss Confederation .ch CH CHE
1674 Türkiye 651.342857 1.516645e+05 10950.180 The Republic of Turkey .tr TR TUR
1709 Ukraine 568.000000 7.058300e+05 5969.400 Ukraine .ua UA UKR
1756 United Kingdom 857.700000 6.026521e+05 13157.994 The United Kingdom of Great Britain and Northe... .uk GB GBR
1776 United States 940.355913 5.122496e+06 119328.543 Baker Island, Howland Island, Jarvis Island... NaN UM UMI
In [353]:
DF.shape
Out[353]:
(39, 8)
In [354]:
DF.head()
Out[354]:
Country precipitation_anual CO2_anual solar_energy_anual Officialstatename InternetccTLD iso2 iso3
0 Australia 764.802985 278154.156296 21033.194 The Commonwealth of Australia .au AU AUS
67 Austria 1105.285714 62145.248288 2042.934 The Republic of Austria .at AT AUT
74 Belarus 622.833333 108345.055021 176.000 The Republic of Belarus .by BY BLR
80 Belgium 821.200000 120292.644131 5105.400 The Kingdom of Belgium .be BE BEL
81 Bulgaria 528.714286 76699.197427 1480.856 The Republic of Bulgaria .bg BG BGR
In [355]:
DF.info()
<class 'pandas.core.frame.DataFrame'>
Index: 39 entries, 0 to 1776
Data columns (total 8 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Country              39 non-null     object 
 1   precipitation_anual  39 non-null     float64
 2   CO2_anual            39 non-null     float64
 3   solar_energy_anual   39 non-null     float64
 4   Officialstatename    39 non-null     object 
 5   InternetccTLD        38 non-null     object 
 6   iso2                 39 non-null     object 
 7   iso3                 39 non-null     object 
dtypes: float64(3), object(5)
memory usage: 2.7+ KB

4. Gráficas¶

In [356]:
import matplotlib.pyplot as plt

DF['precipitation_anual'].plot(kind='box')
plt.title('Variación de la precipitación media anual en diferentes países')
plt.show()
In [358]:
DF[['CO2_anual']].plot(kind='box')
plt.title('Variación de la emisión de CO2 en 1990 por diferentes países')
plt.show()
In [359]:
plt.bar(DF['Country'], DF['solar_energy_anual'])
plt.xticks(rotation=90)
plt.title('Contribución en la producción de energía solar en el 2020 por país')
plt.show()
In [360]:
plt.bar(DF['Country'], DF['CO2_anual'])
plt.xticks(rotation=90)
plt.title('Contribución en la emisión de CO2 en el año 1990 por país')
plt.show()
In [ ]: