Logo Logo
  • Ana Sayfa
  • Hakkında
  • Kategoriler
    • Genel
    • İstatistik
    • Makine Öğrenme
    • Model Geliştirme
    • Sağlık
    • Teknoloji
  • Tüm Yazılarım
  • İletişim

İletişim

  • Email buluttevfik@gmail.com

Site Haritası

  • Ana Sayfa
  • Hakkında
  • İletişim

Sosyal Medya Adresleri

Exploratory Data Analysis of Turkey Earthquages II

  • ANA SAYFA
  • Blog Details
Şubat 2 2020
  • Makine Öğrenme

The main purpose of the study is to analysis Turkey earthquake data set obtained from Boğaziçi University KOERI Regional Earthquage-Tsunami Monitoring Center online database using data mining techniques. In this way, it is targeted to be created awareness.

Data set includes earthquakes with magnitude between 3.0 and 9.0. Number of observations is 50000, and number of variables is 15. Earthquage data set is consisted of time series from year 1979 to date 2019-10-29. The definitions of the variables in the data set are given below (Boğaziçi University KOERI Regional Earthquage-Tsunami Monitoring Center).

  1. No: Event Sequence.
  2. Event ID: Unic ID for event [YYYYMMDDHHMMSS (YearMonthDayHourMinuteSecond)
  3. Date: Date of event specified in the following format YYYY.MM.DD (Year.Month.Day).
  4. Origin Time: Origin time of event (UTC) specified in the following format HH:MM:SS.MS (Hour:Minute:Second.Millisecond).
  5. Latitude: in decimal degrees.
  6. Longitude: in decimal degrees.
  7. Depth(km): Depth of the event in kilometers.
  8. xM: Biggest magnitude value in specified magnitude values (MD, ML, Mw, Ms and Mb).
  9. MD ML Mw Ms Mb Type: Magnitude types (MD: Duration, ML: Local, Mw: Moment, Ms: Surface wave, Mb: Body-wave). 0.0 (zero) means no calculation for that type of magnitude.
  10. Location: Nearest settlement.

Exploratory analysis of the earthquake data set is given step by step in the next sections. R programming language is used in the analysis.

Loading Libraries

library(readr)
library(tibble)
library(tidyr)
library(dplyr)
library(lubridate)
library(formattable)
library(ggplot2)
library(ggpubr)
library(formattable)
library(GGally)
library(ggrepel)
library(tidyverse)
library(leaflet)
library(sf)
library(widgetframe)

Loading Data Set

df <- read_delim("data_bogazici.txt", 
    "\t", escape_double = FALSE, trim_ws = TRUE)
df<-as_tibble(df)

Classification by Year, Month, Day, Hour, Minute, and Second

df1<-df[,-c(1,2, 14)]
str(df1)
depth<-tibble(Depth= as.numeric(df1$`Der(km)`))
location<-as_tibble(df1[, 12])
year<-tibble(Year=as.integer(substring(df1$`Olus tarihi`,1,4)))
month<-tibble(Month=as.integer(substring(df1$`Olus tarihi`,6,7)))
day<-tibble(Day=as.integer(substring(df1$`Olus tarihi`,9,10)))
hour<-tibble(Hour=as.integer(hour(df1$`Olus zamani`)))
minute<-tibble(Minute=as.integer(minute(df1$`Olus zamani`)))
second<-tibble(Second=as.integer(second(df1$`Olus zamani`)))
df2<-cbind(year, month, day, hour, minute, second, Latitude= as_tibble(df1[,3]),Longitude=as_tibble(df1[,4]), Depth= depth, Magnitute= as_tibble(df1[,6]), Location=location)
head(df2)
df2<-df2 %>% rename(Latitude = Enlem, Longitude=Boylam, Location=Yer, Magnitude=xM)
#Adding column that categorize magnitudes of earthquages
df2<-mutate(df2, Magnitude_Class=cut(df2$Magnitude, breaks=c(2.9, 4, 5, 6, 7, 8), labels=c("3-4", "4-5", "5-6", "6-7", "7-8")))
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':	50000 obs. of  12 variables:
 $ Olus tarihi: chr  "2019.10.29" "2019.10.29" "2019.10.29" "2019.10.27" ...
 $ Olus zamani: 'hms' num  20:48:53 15:38:41 06:36:14 10:18:46 ...
  ..- attr(*, "units")= chr "secs"
 $ Enlem      : num  38.2 40.7 40.7 40.9 39.7 ...
 $ Boylam     : num  42.9 27.4 32.9 28.2 26.4 ...
 $ Der(km)    : chr  "005.0" "003.4" "007.4" "010.8" ...
 $ xM         : num  3 3.3 3.8 3.5 3.2 3 3.5 3.5 3 3.5 ...
 $ MD         : num  0 0 0 0 0 0 0 0 0 0 ...
 $ ML         : num  3 3.3 3.7 3.5 3.1 3 3.5 3.5 3 3.5 ...
 $ Mw         : num  2.9 3.1 3.8 3.3 3.2 2.8 3.4 3.4 2.9 3.4 ...
 $ Ms         : num  0 0 0 0 0 0 0 0 0 0 ...
 $ Mb         : num  0 0 0 0 0 0 0 0 0 0 ...
 $ Yer        : chr  "UNLUCE-BAHCESARAY (VAN) [North East  7.8 km]" "GUZELKOY ACIKLARI-TEKIRDAG (MARMARA DENIZI)" "HACILAR-CERKES (CANKIRI) [North West  4.1 km]" "SILIVRI ACIKLARI-ISTANBUL (MARMARA DENIZI)" ...
 - attr(*, "problems")=Classes ‘tbl_df’, ‘tbl’ and 'data.frame':	2 obs. of  5 variables:
  ..$ row     : int  41487 43525
  ..$ col     : chr  "MD" "Olus zamani"
  ..$ expected: chr  "no trailing characters" "valid date"
  ..$ actual  : chr  "R" "10:03:73.00"
  ..$ file    : chr  "'data_bogazici.txt'" "'data_bogazici.txt'"
 - attr(*, "spec")=
  .. cols(
  ..   No = col_character(),
  ..   `Deprem Kodu` = col_double(),
  ..   `Olus tarihi` = col_character(),
  ..   `Olus zamani` = col_time(format = ""),
  ..   Enlem = col_double(),
  ..   Boylam = col_double(),
  ..   `Der(km)` = col_character(),
  ..   xM = col_double(),
  ..   MD = col_double(),
  ..   ML = col_double(),
  ..   Mw = col_double(),
  ..   Ms = col_double(),
  ..   Mb = col_double(),
  ..   Tip = col_character(),
  ..   Yer = col_character()
  .. )
  Year Month Day Hour Minute Second   Enlem  Boylam Depth  xM
1 2019    10  29   20     48     53 38.1520 42.9158   5.0 3.0
2 2019    10  29   15     38     41 40.7248 27.3940   3.4 3.3
3 2019    10  29    6     36     14 40.7342 32.9457   7.4 3.8
4 2019    10  27   10     18     46 40.8810 28.2057  10.8 3.5
5 2019    10  27    9     17     31 39.6660 26.3607   6.2 3.2
6 2019    10  27    8     18     53 40.8760 28.2063   6.1 3.0
                                                 Yer
1       UNLUCE-BAHCESARAY (VAN) [North East  7.8 km]
2        GUZELKOY ACIKLARI-TEKIRDAG (MARMARA DENIZI)
3      HACILAR-CERKES (CANKIRI) [North West  4.1 km]
4         SILIVRI ACIKLARI-ISTANBUL (MARMARA DENIZI)
5 CAKMAKLAR-AYVACIK (CANAKKALE) [South East  1.4 km]
6         SILIVRI ACIKLARI-ISTANBUL (MARMARA DENIZI)

Density of earthquages

df2%>%ggplot(aes(Year, Magnitude))+
geom_point(size=1, col="red")+
   ggtitle("Density of Earthquakes by Years") +
           xlab("Year") + ylab("Magnitude")+
   scale_x_continuous(breaks=seq(min(df2$Year),max(df2$Year), 4)) +
   labs(caption = "Data Source: Boğaziçi University KOERI Regional 
        Earthquage-Tsunami Monitoring Center")+
   theme(plot.title = element_text(family = "Trebuchet MS", face="bold", 
         size=14, hjust=0.5)) +
   theme(axis.title = element_text(family = "Trebuchet MS", face="bold", 
         size=12))+
   geom_hline(yintercept=mean(df2$Magnitude), linetype="twodash", color = 
              "green", size=1)+
   geom_hline(yintercept=4, linetype="twodash", color = "blue", size=1)+
   geom_hline(yintercept=5, linetype="twodash", color = "blue", size=1)+
   geom_hline(yintercept=6, linetype="twodash", color = "blue", size=1)+
   geom_hline(yintercept=7, linetype="twodash", color = "blue", size=1)

Number of earthquakes by years

year<-df2 %>% group_by(Year) %>% tally()
formattable (year)
year%>%ggplot(aes(Year, n))+
geom_line(size=1, col="red")+
  scale_x_continuous(breaks=seq(min(year$Year),max(year$Year), 10))+
   ggtitle("Number of Earthquakes by Years") +
           xlab("Year") + ylab("Number of Cases")+
  labs(caption = "Data Source: Boğaziçi University KOERI Regional Earthquage-Tsunami Monitoring Center")+
  theme(plot.title = element_text(family = "Trebuchet MS", face="bold", size=14, hjust=0.5)) +
theme(axis.title = element_text(family = "Trebuchet MS", face="bold", size=12))+
 geom_hline(yintercept=mean(year$n), linetype="twodash", color = "green", size=1)

Density of earthquakes by years

df2%>%ggplot(aes(Year, Magnitude, col=Magnitude_Class))+
geom_point(size=1)+
  geom_jitter()+
  facet_grid(Magnitude_Class~., scale="free")+
   ggtitle("Density of Earthquakes of by Years") +
           xlab("Year") + ylab("Magnitude")+
  labs(caption = "Data Source: Boğaziçi University KOERI Regional Earthquage-Tsunami Monitoring Center")+
  theme(plot.title = element_text(family = "Trebuchet MS", face="bold", size=12, hjust=0.5)) +
theme(axis.title = element_text(family = "Trebuchet MS", face="bold", size=10))

Number of Earthquakes by Categories

year<-df2 %>% group_by(Year, Magnitude_Class) %>% tally()
formattable (year)
year%>%ggplot(aes(Year, n))+
geom_point(size=1, col="red")+
  facet_wrap(~Magnitude_Class,  ncol=2, scales="free")+
   ggtitle("Number of Earthquakes by Categories") +
           xlab("Year") + ylab("Number of Cases")+
  labs(caption = "Data Source: Boğaziçi University KOERI Regional Earthquage-Tsunami Monitoring Center")+
  theme(plot.title = element_text(family = "Trebuchet MS", face="bold", size=14, hjust=0.5)) +
theme(axis.title = element_text(family = "Trebuchet MS", face="bold", size=12))

The number of categories of earthquakes

year<-df2 %>% group_by(Magnitude_Class) %>% tally()
formattable (year)
year%>%ggplot(aes(Magnitude_Class, n))+
geom_point(size=1, col="red")+
  facet_grid(~Magnitude_Class)+
   ggtitle("Number of Earthquakes by Categories") +
           xlab("Categories of Earthquakes") + ylab("Number of Cases")+
  labs(caption = "Data Source: Boğaziçi University KOERI Regional Earthquage-Tsunami Monitoring Center")+
  theme(plot.title = element_text(family = "Trebuchet MS", face="bold", size=14, hjust=0.5)) +
theme(axis.title = element_text(family = "Trebuchet MS", face="bold", size=12))+
  geom_text_repel(aes(label=n), size=3, data=year) + theme(legend.position = "None")

Number of Cases by Months

month<-df2 %>% group_by(Month) %>% tally()
month<-month %>% select(Month, n)%>%
  arrange(desc(n))
month
month %>% ggplot(aes(Month, n))+
geom_line(size=1, col="brown")+
  scale_x_continuous(breaks=seq(1, 12, 1))+
   ggtitle("Number of Cases by Months") +
           xlab("Month") + ylab("Number of Cases")+
  labs(caption = "Source: Data Source: Boğaziçi University KOERI Regional Earthquage-Tsunami Monitoring Center")+
  theme(plot.title = element_text(family = "Trebuchet MS", face="bold", size=14, hjust=0.5)) +
theme(axis.title = element_text(family = "Trebuchet MS", face="bold", size=12))+
  geom_hline(yintercept=mean(month$n), linetype="twodash", color = "red", size=1)

Number of Cases by Months and Categories

f2<-mutate(df2, Magnitude_Class=cut(df2$Magnitude, breaks=c(2.9, 5, 6, 8), labels=c("3-5", "5-6", "6-8")))
m<-df2 %>% group_by(Month, Magnitude_Class) %>% tally()
formattable (m)
m %>% ggplot(aes(Month, n))+
geom_point(size=1, col="red")+
   ggtitle("Number of Earthquakes by Categories") +
  scale_x_continuous(breaks=seq(1, 12, 1))+
           xlab("Month") + ylab("Number of Cases")+
  labs(caption = "Data Source: Boğaziçi University KOERI Regional Earthquage-Tsunami Monitoring Center")+
  theme(plot.title = element_text(family = "Trebuchet MS", face="bold", size=14, hjust=0.5)) +
theme(axis.title = element_text(family = "Trebuchet MS", face="bold", size=12))+
  geom_text_repel(aes(label=n), size=3, data=m) + theme(legend.position = "None")+
   facet_grid(Magnitude_Class~.)

Number of Earthquakes by Hour

hour<-df2 %>% group_by(Hour) %>% tally()
hour<-hour %>% select(Hour, n)%>%
      arrange(desc(n))
hour %>% ggplot(aes(Hour, n))+
geom_line(size=1, col="red")+
  scale_x_continuous(breaks=seq(0, 24, 2))+
   ggtitle("Number of Cases by Hour") +
           xlab("Time") + ylab("Number of Cases")+
  labs(caption = "Data Source: Boğaziçi University KOERI Regional Earthquage-Tsunami Monitoring Center")+
  theme(plot.title = element_text(family = "Trebuchet MS", face="bold", size=14, hjust=0.5)) +
theme(axis.title = element_text(family = "Trebuchet MS", face="bold", size=12))+
  geom_hline(yintercept=mean(hour$n), linetype="twodash", color = "blue", size=1)

Number of Earthquakes by Categories and Hour

h<-df2 %>% group_by(Hour, Magnitude_Class) %>% tally()
h%>%ggplot(aes(Hour, n))+
geom_point(size=1, col="red")+
  facet_wrap(~Magnitude_Class,  ncol=2, scales="free")+
   ggtitle("Number of Earthquakes by Categories") +
           xlab("Time") + ylab("Number of Cases")+
  labs(caption = "Data Source: Boğaziçi University KOERI Regional Earthquage-Tsunami Monitoring Center")+
  theme(plot.title = element_text(family = "Trebuchet MS", face="bold", size=14, hjust=0.5)) +
theme(axis.title = element_text(family = "Trebuchet MS", face="bold", size=12))

Map of the earthquages with magnitudes between 5.0 and 6.0

(y <- df2 %>%
  filter(`Magnitude_Class` == "5-6"))
leaflet() %>%
  addTiles() %>%
  addMarkers(data = y, clusterOptions = markerClusterOptions())

Map of the earthquages with magnitudes between 6.0 and 8.0

(y <- df2 %>%
  filter(`Magnitude_Class` == "6-8"))
leaflet() %>% addTiles() %>%
  addCircleMarkers(data=y,
    label=y$Magnitude,
    labelOptions = labelOptions(noHide = T, direction = 'top'))

Earthquages of İstanbul City

istanbul<-df2 %>% filter(str_detect(Location, "ISTANBUL"))
leaflet() %>%
  addTiles() %>%
  addMarkers(data = istanbul, clusterOptions = markerClusterOptions())

Earthquages of Manisa City

manisa<-df2 %>% filter(str_detect(Location, "MANISA"))
leaflet() %>%
  addTiles() %>%
  addMarkers(data = Manisa, clusterOptions = markerClusterOptions())

Earthquages of Elazığ City

elazıg<-df2 %>% filter(str_detect(Location, "ELAZI"))
leaflet() %>%
  addTiles() %>%
  addMarkers(data = elazıg, clusterOptions = markerClusterOptions())

Density plot of magnitute

ggdensity(df2$Magnitude, 
          main = "Density plot of magnitude",
          xlab = "Magnitute")

Density plot of depth

ggdensity(df2$Depth, 
          main = "Density plot of depth",
          xlab = "Depth")

QQ plot of magnitute

ggqqplot(df2$Magnitude)

QQ plot of depth

ggqqplot(df2$Depth)

Kolmogorov-Smirnov Normality test

#Kolmogorov-Smirnov test is used in place of Shapiro-Wilk’s one because sample size exceeds 5000.
ks.test(df2$Magnitude, df2$Depth)
p-value will be approximate in the presence of ties
	Two-sample Kolmogorov-Smirnov test
data:  df2$Magnitude and df2$Depth
D = 0.8073, p-value < 2.2e-16
alternative hypothesis: two-sided

Correlation between depth of earthquage and magnitude of earthquage

ggscatter(df2, x = "Magnitude", y = "Depth", 
          add = "reg.line", conf.int = TRUE, 
          cor.coef = TRUE, cor.method = "pearson",
          xlab = "Magnitute", ylab = "Depth", main="Correlation between depth of earthquage and magnitude of earthquage")

Correlation Analysis

#There is no strong relationship between depth and magnitude
cor.test(df2$Magnitude, df2$Depth, 
                    method = "pearson")
	Pearson's product-moment correlation
data:  df2$Magnitude and df2$Depth
t = 33.943, df = 49998, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.1415024 0.1586381
sample estimates:
      cor 
0.1500815 

Conclusion

In this study, it is aimed to be conducted exploratory data analysis of Turkey earthquages using data mining techniques. From descriptive statistics, it is understood that earthquakes often show up at night and in the evening. It is observed that the eartquages with magnitudes ranging from 5.0 to 8.0 are more intense in ones between 10th and 12th months relative to other months.

The findings show that there is no strong correlation between depth of earthquages and magnitude of earthquages. Factors such as soil and rock structure may have affected this relationship. In addition, these factors need to be evaluated.

Hope to create awareness..

I attribute this work to our citizens who died in the earthquake.

References

https://rpubs.com/tevfik1461/Turkey

https://tevfikbulut.com/2020/01/31/exploratory-data-analysis-of-turkey-earthquakes/

https://www.r-project.org/

https://cfss.uchicago.edu/notes/raster-maps-with-ggmap/

http://www.koeri.boun.edu.tr/sismo/zeqdb/indexeng.asp

http://www.koeri.boun.edu.tr/sismo/zeqdb/

Önceki yazı Sonraki Yazı
EarthquageExploratory Data Analysis (EDA)Turkey Earthquages

Yorum Yaz Cevabı iptal et

Son Yazılar

  • Kanada Sağlık Sisteminde Bekleme Süreleri
  • Araştırma Metodolojisi Notları-II
  • Araştırma Metodolojisi Notları-I
  • Microsoft Excel’de Bulut Endeks-Beta [BE-β] Simülasyonu
  • R’da Statik ve Dinamik Haritalama Vaka Çalışmaları: Türkiye Örneği

Son Yorumlar

  1. Küresel İnovasyon Endeksi 2021 Yılı Raporu ve Türkiye - winally.com - Küresel İnovasyon Endeksi’nde Türkiye Ne Durumda?
  2. R’da Birliktelik Kuralları | canözkan - Apriori Algoritması Üzerine Bir Vaka Çalışması: A Case Study on Apriori Algorithm
  3. Tevfik BULUT - Python’da Şans Oyunları Perspektifinden Olasılık : Probability from Perspective of the Chance Games in Python
  4. Ahmet Aksoy - Python’da Şans Oyunları Perspektifinden Olasılık : Probability from Perspective of the Chance Games in Python
  5. Tevfik BULUT - Z Tablosuna Göre Güven Aralığının Hesaplanmasına Yönelik Bir Simülasyon Çalışması: A Simulation Study for Calculating Confidence Interval by Z Table

Arşivler

  • Ocak 2023
  • Ekim 2022
  • Eylül 2022
  • Nisan 2022
  • Mart 2022
  • Ekim 2021
  • Eylül 2021
  • Ağustos 2021
  • Temmuz 2021
  • Haziran 2021
  • Mayıs 2021
  • Nisan 2021
  • Şubat 2021
  • Ocak 2021
  • Aralık 2020
  • Kasım 2020
  • Ekim 2020
  • Eylül 2020
  • Ağustos 2020
  • Temmuz 2020
  • Haziran 2020
  • Mayıs 2020
  • Nisan 2020
  • Mart 2020
  • Şubat 2020
  • Ocak 2020
  • Aralık 2019
  • Kasım 2019
  • Ekim 2019
  • Eylül 2019
  • Ağustos 2019
  • Mayıs 2019
  • Şubat 2019
  • Aralık 2018
  • Eylül 2018
  • Ağustos 2018
  • Temmuz 2018
  • Mayıs 2018
  • Nisan 2018
  • Ekim 2017
  • Temmuz 2017
  • Haziran 2017
  • Mayıs 2017
  • Ocak 2017

Kategoriler

  • Genel
  • İstatistik
  • Makine Öğrenme
  • Model Geliştirme
  • Sağlık
  • Teknoloji

Kategoriler

  • Genel
  • İstatistik
  • Makine Öğrenme
  • Model Geliştirme
  • Sağlık
  • Teknoloji

Etiketler

Accuracy Basit Tesadüfi Örnekleme Bernoulli Olasılık Dağılımı Confusion Matrix Coronavirus Doğruluk Doğruluk Oranı Dünya Sağlık Örgütü EDA Epidemi Epidemiyology Epidemiyoloji Exploratory Data Analysis Exploratory Data Analysis (EDA) F1 Forecast Keşifsel Veri Analizi Kitle Olasılık Fonksiyonu Koronavirüs Koronavirüs Salgını Olasılık Olasılıklı Örneklem OSB Pandemi Point Estimation Point Forecast Prevalance Prevalans Probability Sampling R Recall Salgın Sağlık Bakanlığı Simple Random Sampling Tahmin TBATS TURKEY TÜRKİYE Veri Madenciliği WHO World Health Organization Yapay Zeka ÇKKV Örneklem Örneklem Büyüklüğü
Logo

Burada, gazete ve dergilerde yayınlanan çalışmalarımın tamamı çalışmakta olduğum kurumdan bağımsız olarak özel hayatımda yaptığım çalışmalardır. Dolayısıyla, burada yer alan çalışmalardan emeğe saygı adına kaynak gösterilmesi suretiyle azami ölçüde herkes yararlanabilir.

Site Haritası

  • Ana Sayfa
  • Hakkında
  • Blog
  • İletişim

Linkler

  • winally.com

Bana Ulaşın

Bu sayfa, bazı temel bilgilerin ve bir iletişim formunun yer aldığı bir iletişim sayfasıdır. Suç teşkil edecek, yasadışı, tehditkar, rahatsız edici, hakaret ve küfür içeren, aşağılayıcı, küçük düşürücü, kaba, müstehcen, ahlaka aykırı, kişilik haklarına zarar verici ya da benzeri niteliklerde içeriklerden doğan her türlü mali, hukuki, cezai, idari sorumluluk içeriği gönderen Kişilere aittir.

  • Email: buluttevfik@gmail.com

© Copyright 2022 Tevfik Bulut