I'm not really experienced with this sort of thing, but I have been reading a horribly typo filled epub and finally decided to try and fix it. It's a public domain book (pretty sure in all countries), so I'm posting a copy here.
I tried using tweak book in calibre....so the first file has been edited....but go to page 143 out of 1114 in the calibre book viewer (not sure how else to mark the place), and it is in the original condition. I'm kind of mad I wasted the time in the first file as that is about how far I'd already read on my glo.
It seems it is one of those books that have been scanned, used a crappy OCR and that is it....since it's free and there aren't lots of copies of this book out there, I guess I can't complain....I had trouble enough finding this copy. BUT fixing it has been a real pain.
Even doing spell check is pretty annoying since there is a lot of french in the book as well as a lot of names of people and places that don't repeat. I don't even know if I could do the spell check if I didn't already know how this author writes and how his work is usually translated.
On top of typos, there are innummerable extra spaces *everywhere*....and sentences lopped in half with a new page and new paragraph marker. So I've been trying to do search and search and replace, but there are so many different cases.
So I guess the big question is, is there a simpler way to do what I've been doing?? I've spent hours just doing the first file and there are 8 or so. I could pretty much read it in the process. I'm not at all experienced at this type of thing, so maybe I'm missing something, or maybe this situation just requires this much work?
I also was curious about 2 formatting things:
1) the <p> and <p/> markers...is it the fact that the next ones are on a new line that is making the spaces appear between paragraphs on my glo?
2) there are randomly placed new pages using this code:
<div class="newpage" id="page-6"/>
Is that page ID info pertinent to counting pages using the ADE method, or is it a totally unnecessary new page that I can get rid of (especially since they are never anywhere near new chapters)
If an expert wants to take a look at one of the files (not part0000...unless you can correct my edited version even more :D) and give me any tips on what methods would work best in this type of situation, I would really appreciate it.
I tried using tweak book in calibre....so the first file has been edited....but go to page 143 out of 1114 in the calibre book viewer (not sure how else to mark the place), and it is in the original condition. I'm kind of mad I wasted the time in the first file as that is about how far I'd already read on my glo.
It seems it is one of those books that have been scanned, used a crappy OCR and that is it....since it's free and there aren't lots of copies of this book out there, I guess I can't complain....I had trouble enough finding this copy. BUT fixing it has been a real pain.
Even doing spell check is pretty annoying since there is a lot of french in the book as well as a lot of names of people and places that don't repeat. I don't even know if I could do the spell check if I didn't already know how this author writes and how his work is usually translated.
On top of typos, there are innummerable extra spaces *everywhere*....and sentences lopped in half with a new page and new paragraph marker. So I've been trying to do search and search and replace, but there are so many different cases.
So I guess the big question is, is there a simpler way to do what I've been doing?? I've spent hours just doing the first file and there are 8 or so. I could pretty much read it in the process. I'm not at all experienced at this type of thing, so maybe I'm missing something, or maybe this situation just requires this much work?
I also was curious about 2 formatting things:
1) the <p> and <p/> markers...is it the fact that the next ones are on a new line that is making the spaces appear between paragraphs on my glo?
2) there are randomly placed new pages using this code:
<div class="newpage" id="page-6"/>
Is that page ID info pertinent to counting pages using the ADE method, or is it a totally unnecessary new page that I can get rid of (especially since they are never anywhere near new chapters)
If an expert wants to take a look at one of the files (not part0000...unless you can correct my edited version even more :D) and give me any tips on what methods would work best in this type of situation, I would really appreciate it.