[빅데이터분석] R _ 38. 데이터 시각화 5 (히스토그램 그래프)

Notice

Recent Posts

Recent Comments

Link

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

ch0nny_log

[빅데이터분석] R _ 38. 데이터 시각화 5 (히스토그램 그래프) 본문

빅데이터 분석(with 아이티윌)/R

[빅데이터분석] R _ 38. 데이터 시각화 5 (히스토그램 그래프)

chonny 2024. 7. 2. 17:19

막대그래프와 차이

막대 그래프 히스토그램 그래프

개별항목의 수량이나 빈도 비교 (ex. 직업, 연도별) 연속형 데이터의 분포를 시각화 할때 사용 (ex. 시험점수 분)

문제 1. 날씨 데이터로 히스토 그램 그래프를 그리시오.
## 데이터 업로드 및 상태확인
setwd('c:\\data')
weather<-read.csv('weather.csv',header =T, fileEncoding ='euc-kr') 
weather

nrow(weather) #472
ncol(weather) #8
str(weather)

##  시각화할 데이터선택
ahot <- weather$평균기온
ahot

## 히스토 그래프 출력
histo <- hist(ahot)
histo

## 간격조정
histo4 <-hist(ahot,breaks = 50)
histo4


## 간격조정2
hist5<- hist(ahot, breaks=seq(10,40,by=1))
hist5
설명: x축이 평균 기온값(10,40도사이) /y축이 각구간에 해당하는 빈도수
-> 대부부의 데이터가 24도~29도에 밀집해있음
-> 왜도(skewness)를 보면 히스토그램의 분포는 대칭적이지 않고 약간 왼쪽으로 치우쳐져 있음.
-> 확률 밀도그래프로 데이터의 치우침을 시각화 할 수 있음.
## 확률 밀도 히스토그램램
hist(ahot, breaks = seq(10,40, by=1),col='grey',border = 'white',
     prob= T, ylim = c(0,0.3))
hist
# 확률 밀도 히스토그램 그래프에 확률 밀도 그래프 라인 그리기
lines( density(ahot), col="red")
※ 확률밀도 히스토그램 그래프에서 확률밀도라는게 무엇인가 ?
-> 연속형 확률 변수의분포를나타내는 함수 (*특정값이 아니라 특정구간에 속하는 확률변수를 설명)

문제2. 중고차 데이터를 이용하여 히스토그램을 설정하시오.

# 데이터 로드
usedcars <- read.csv('c:\\data\\usedcars.csv', header = TRUE, fileEncoding = 'euc-kr')

# 중고차의 가격 데이터 선택
p <- usedcars$price

# 가격 데이터의 최소값과 최대값 확인
min_price <- min(p)
max_price <- max(p)

# 약간의 여유를 두고 breaks 설정
breaks <- seq(min_price - 500, max_price + 500, by = 1000)

# 확률 밀도 히스토그램 생성
hist(p, breaks = breaks, col = 'grey', border = 'white', prob = TRUE, ylim = c(0, 0.0003),
     xlab = 'Price', ylab = 'Density', main = 'Probability Density Histogram of Used Car Prices')

# 밀도 그래프 라인 추가
lines(density(p), col = 'red')

by =1000 은 구간설정 (5000,6000,7000,8000,9000~~)

그래프 해석: 비율이 중앙 양측에 균

문법2. plotly 를 이용해서 중고차 가격 데이터의 히스토그램 그래프를 그리시오


# plotly 패키지 로드
library(plotly)

# 데이터 로드
data <- read.csv("usedcars.csv")

# plotly를 사용한 히스토그램 생성
fig <- plot_ly(data, x = ~price, type = 'histogram', marker = list(color = 'blue'))

# 그래프 레이아웃 설정
fig <- fig %>%
  layout(title = 'Histogram of Used Car Prices',
         xaxis = list(title = 'Price'),
         yaxis = list(title = 'Count'))

# 그래프 출력
fig

문법 3. y축이 빈도수가 아니라 확률 로 출력되게 하시오.

# plotly 패키지 로드
library(plotly)

# 데이터 로드
data <- read.csv("usedcars.csv")

# plotly를 사용한 히스토그램 생성
fig <- plot_ly(data, x = ~price, type = 'histogram', marker = list(color = 'blue'), 
                  histnorm = 'probability')

# 그래프 레이아웃 설정
fig <- fig %>%
  layout(title = 'Probability Histogram of Used Car Prices',
         xaxis = list(title = 'Price'),
         yaxis = list(title = 'Probability'))

# 그래프 출력
fig

★ 마지막문제. 오늘 그린 그래프들을 이용해서 sql포트폴리오 데이터를 이용해서 시각화를 하시오.

# plotly 패키지 로드
library(plotly)
library(dplyr)

# 데이터 로드
birth <- read.csv("c:\\data\\birth_table.csv", header = TRUE, fileEncoding = 'euc-kr')

# 데이터 구조 확인
print(head(birth))

# 기존 코드에 추가된 코드:  year, com_size 별로 birth_cnt 합산

a <- birth %>%
  group_by(year, com_size) %>%
  summarise(total_birth = sum(birth_cnt), .groups = 'drop')

print(a)

# plotly를 사용한 라인 그래프 생성
fig <- plot_ly(a, 
               x = ~year, 
               y = ~total_birth, 
               color = ~com_size, 
               colors = c('blue', 'red', 'gold', 'purple'),
               type = 'scatter', 
               mode = 'lines+markers')

# 그래프 레이아웃 설정
fig <- fig %>%
  layout(title = '사업장 규모별 출산수',
         xaxis = list(title = 'YEAR'),
         yaxis = list(title = 'TOTAL_BIRTH'))

# 그래프 출력
fig

'빅데이터 분석(with 아이티윌) > R' 카테고리의 다른 글

R 그래프 코드 모음 (0)	2024.07.03
[빅데이터분석] R _ 39. 데이터 시각화 5 (히스토그램 그래프 2) (1)	2024.07.03
[빅데이터분석] R _ 37. 데이터 시각화4 (라인 그래프) (0)	2024.07.02
[빅데이터분석] R _ 36. 데이터 시각화 3 (산포도) (0)	2024.07.02
[빅데이터분석] R _ 35. 데이터 시각화 2(그래프 생성 문법) (0)	2024.07.02

'빅데이터 분석(with 아이티윌)/R' Related Articles

막대 그래프	히스토그램 그래프
개별항목의 수량이나 빈도 비교 (ex. 직업, 연도별)	연속형 데이터의 분포를 시각화 할때 사용 (ex. 시험점수 분)

ch0nny_log

[빅데이터분석] R _ 38. 데이터 시각화 5 (히스토그램 그래프) 본문

[빅데이터분석] R _ 38. 데이터 시각화 5 (히스토그램 그래프)

'빅데이터 분석(with 아이티윌) > R' 카테고리의 다른 글

티스토리툴바