[문제 해결] pytorch model save runtime error

티스토리 뷰

문제 해결

[문제 해결] pytorch model save runtime error

devim 2024. 1. 22. 22:15

pytorch 기반으로 간단한 MLP 모델을 만들었고

가장 손실함수값이 적은 모델을 저장하는 파일 경로를 설정한 뒤 저장할 수 있도록 만들었다.

N 크기의 데이터에 대해서 100000 epochs 를 학습하는 과정을 k번 반복했는데

임의의 k번째 n epoch 에서 자꾸 모델이 저장된 파일을 열 수 없다는 runtimeerror 가 발생했다.

아래 에러 메세지를 검색했고 다행히 비슷한 문제를 겪은 경우를 발견할 수 있었다.

RuntimeError: File MODEL_SAVE_PATH.pth cannot be opened.

https://stackoverflow.com/questions/74925031/runtime-error-while-saving-a-pytorch-model-file-path-to-be-saved-cannot-be-op

Runtime Error while Saving a PyTorch Model: "File /path/to/be/saved Cannot Be Opened"

I run a CNN model on CIFAR-10 using PyTorch and use the official PyTorch tutorial to save a general checkpoint. When the training and testing is completed I pass the last epoch to this save_model

stackoverflow.com

해당 질문에 대해 어떤 답변자는 아래와 같이 코드를 수정해서 해결되었다고 한다.

Befor:

model_path = os.path.join(model_dir, str(datetime.datetime.now())+".pth")
torch.save(self.state_dict(), model_path)

After:

model_path = os.path.join(model_dir, "model"+".pth")
torch.save(self.state_dict(), model_path)

나는 이 문제가 해결된 이유가 model 을 저장할 파일명을 파라미터로 전달할 때 다른 라이브러리의 호출없이 단순한 string 의 연산으로만 구성했기 때문이라고 생각했다.

실제 나의 코드도 PATH 라는 string 타입의 변수에 format 함수를 사용해서 문자열 변환을 해주었고

checkpoint = torch.load(PATH)
model.load_state_dict(checkpoint['model'])

state = {
    'model': model.state_dict()
}
torch.save(state, PATH.format(epoch))

아래와 같이 단순하게 바꾸었을 때 문제가 해결되었다.

torch.save(state, PATH)

+) 문제가 해결된 줄 알았는데, 지금 보니 또 같은 문제가 발생했다 ㅠㅠ

+) 파이토치 공식 문서를 봐도 첫번째 방법이 맞는데 왜 안되는지 모르겠다 !!!!

https://tutorials.pytorch.kr/beginner/saving_loading_models.html

모델 저장하기 & 불러오기

Author: Matthew Inkawhich, 번역: 박정환, 김제필,. 이 문서에서는 PyTorch 모델을 저장하고 불러오는 다양한 방법을 제공합니다. 이 문서 전체를 다 읽는 것도 좋은 방법이지만, 필요한 사용 예의 코드만

tutorials.pytorch.kr

+) 문제 해결 완료: loss 가 min 이면 현재 모델 상태를 저장하는 코드를 반복해서 실행했는데,

이 반복 실행으로 인해 짧은 시간 동안 여러번 파일을 덮어써서 충돌이 발생함.

(특히 나의 경우, 여러 반복문을 돌며 여러번 학습하다보니 N번 반복 * 10000 epochs 학습이 되어버린 것이다.)

따라서, 총 10000 epochs 를 학습한다고 가정했을 때, 기존에는 3000 ecpohs 이후로 loss 가 min 이면 모델을 저장했다면, 8000 ecpohs 이후로 loss가 min 일 때 모델을 저장하면 해결된다.

공지사항

최근에 올라온 글

최근에 달린 댓글

Total

Today

Yesterday

링크

Github

TAG more

« 2024/05 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

글 보관함

데이터 전문가로 성장하는 기록

티스토리 뷰

[문제 해결] pytorch model save runtime error

티스토리툴바