How eliminate ALL extra code - Post...

User 2270470 Photo


Registered User
2 posts

How would I eliminate all the extra code added by any application?
I want to clean up the code and just have the minimum number of lines.
For example when an MS Office app saves a file as html I end up with at lot of junk.
Yes I can manually clean up but I am looking for some automation.
Thanks for the help,
Dennis
User 38401 Photo


Senior Advisor
10,951 posts

Why are you making it in MS Office to begin with? Other than that, there's no way to do what you want to do, you will end up manually removing it as those apps always add stupidity to their outputs. You'd be better off coding it yourself to start with rather than creating it in those apps.

Granted if you're not a coder by default that would be a little harder, but if you start with a premade theme half the battle is done there.

Anyways, not much you can do about it as those programs such as Word, etc always add tons of unnecessary code and there's no built in tools to remove that sorry.
User 122279 Photo


Senior Advisor
14,610 posts
Online Now

Dennis, Jo Ann is right about sites created in Word. And I know, the markup is monstrous!

In order to remove all the extra code that Word uses, just copy the text as you see it when you have a Word document open, and paste it into the CC html Editor. After that you need to add your own code, so that it looks nice.

If you have inserted images into your Word file, they have to be added separately into the html Editor, and not from Word, but from the original image file. And be aware that while in Word, you can 'scale' up and down images by dragging the corners, you can't do that in the Editor. You need to scale images to the right size in an image editing programme before you add it to the Editor.
Ha en riktig god dag!
Inger, Norway

My work in progress:
Components for Site Designer and the HTML Editor: https://mock-up.coffeecup.com


User 2200796 Photo


Registered User
45 posts

I agree that it's best to do the layout and images in an HTML editor to begin with.

But realistically, there's a problem with saying, don't use a word processor and export HTML.

Most people, clients, designers, end-users, whoever they are, are going to have masses of text in word processor files that need to go into the web page or into an ebook format (which is a flavor of XHTML under the hood, with extra goodies). This may be creative writing, technical writing, advertising/marketing material, what have you. Telling someone to export that as text or copy and paste is one solution, yes, but then someone (the guy or girl doing the web pages) then has to go back and re-enter all the basic formatting (bold, italics, p and h tags, etc.) and then create the stylesheet and layout. Then add in images and other assets.

The frustrating trouble with that is, it's recreating the wheel, repeating what, in one sense, has already been done. The designer or coder must re-code to achieve the same results they already had from the word processor file. It's similar if they try exporting from a page layout program like InDesign.

Yes, it can be done and that's likely the best method, but...aarrgh. It's going to frustrate people on both ends of the pipeline.

And yes, I've seen (and had to clean up) the monstrous, nightmarish, unclean(!) output from MS Word or from OpenOffice / LibreOffice Writer. Ow, I remember trying it with PageMaker output years ago. Ugh. -- I've also tried it when exporting files as text. Either way is not fun. Especially if the project was for chapters of a serial novel for online fiction. Or anything remotely technical. (How lost can I get among deeply nested td cells?) -- And when someone brought me *only* the HTML files without the original source .doc files and wanted me to edit the *text* as though it was a word processor file...oh, it's not nice.

Even the newest versions of Word and Writer still produce horrible HTML. Tags in all uppercase in HTML 4.01? Embedded inline styles?

I don't see the word processor or page layout programs doing anything reasonable about it any time soon.

What I think many designers/users would really appreciate is a code cleaner or file converter tool that imports a word processor file (.doc, .docx, .odt) or the (.htm, .html, xhtml) output from same, and transforms it into clean coded HTML and CSS files. Yes, I'm probably asking for pie in the sky and manna from heaven. Even some step towards that would help.

While I'm wishing for the moon, I could wish for a word processor that produces clean HTML5 and CSS3 and upcoming EPUB3 from the start, and never messes with (unclean!) horrible formats except when exporting/saving-as. -- Imagine it, if you never had that mess to start with, just nice, clean code.

Uh-huh, I'm off on cloud nine somewhere, aren't I? No strange (or unstrange) substances were consumed, either. ;) (Not my thing.)

It's the difference between what is and what should be. -- And why is it so difficult to output sensible, clean code from a word processor, anyway? Is it really that ugly in the actual word processor source files? (Probably.) (Most people who use word processors probably have no idea why stylesheets are a good idea. Most people who use a page layout program ought to know, but might not, unless they're a designer or have read up on it.)

Grumble, grouch, moan, groan. ;) -- Sorry, preaching to choir, I'm sure.
http://www.shinyfiction.com/
Writing, Editing, Artwork, Audio, and soon Fonts
User 38401 Photo


Senior Advisor
10,951 posts

Although I agree with you fully that people shouldn't "have" to copy things to a plain text editor such as Notepad, Notepadd ++, etc. to strip the code, it's the only way to get rid of all the extra code that programs like Word put into a document.

From the viewpoint of website building. You shouldn't be using those programs to style your text to begin with simply because it's got hidden code that transfers to other editors unless it's stripped. Granted that doesn't make it easy by any means, but it's necessary if you don't want funky stuff happening on your pages. There's no middle ground here, and it simply "must" be stressed for the sake and the sanity of those helping people. I've been that round many many times with people trying to figure out what's wrong, and have had issues myself when using the Shopping Cart Creator Pro when copying directly from a website even. Messed me up extremely bad and working with the team for over a week we couldn't track it down what was wrong, found it out by accident thankfully!

Save your helpers headaches and copy anything that comes from an external source that is NOT a plain text editor and paste it into one so it strips the hidden code. Yes it's going to strip the styling, but you shouldn't be styling it in those programs to start with. Use VSD or your favorite visual editor if you're in need of visually styling things.
User 464893 Photo


Ambassador
1,611 posts

I agree with Jo if you are creating a static display insert the text and style the formatting of the text.

Having said that after research I have found that a Browser acts like any program language interpreter. When you think about it it has to otherwise it could not react to the tags you use, so when the browser sees a starting tag it processes the instructions remembering some till it comes to the closing tag. In the same way when it sees a script start tag it will switch to say Javascript or Php mode. But on experiment I found it would treat a html starting tag the same way and process that until the closing tag.

I am using this to create my CMS action which is totally browser friendly. However it has so be done correctly. I am stating this not as to what you should do but as a point of interest. All languages have a syntax and to get the best out of a language and HTML is just one you have to work by its rules even if some allow you to work slightly left of centre.
The Guy from OZ


User 474778 Photo


Registered User
215 posts

It's fascinating to look at the HTML that MS-Word generates. Microsoft spent tons of time trying for verbatim Web translation from WYSIWYG Word. However, it invariably disappoints. Best to do as Jo Ann suggests: Strip out the Word styling by copying & pasting into a .txt file, and then apply HTML markup and CSS styling appropriate to your page design.

To be fair, how is MS-Word supposed to "guess" how you intend to use its WYSIWYG content? For (a very simple) example, MS-Word doesn't "know" whether it should emit HTML appropriate to a flowing layout or to one with fixed-width columns. Or maybe it should override your carefully crafted CSS by deeming its own styling "important."

=====================

The reverse problem is also interesting. Try copy & paste of a rendered Web page into a Word document. It can be ugly. IMHO, LibreOffice / OpenOffice produces a more usable document than MS-Word does.
halfnium -AT- alum.mit.edu
Yes, I looked just like that in 1962.
User 464893 Photo


Ambassador
1,611 posts

Halfnium I would think you would be into pushing the envelope. I agree with you in principle but I think this conversion system hits the nail on the head
IRUN RTFConverter Version 1.23 RTF1.5 Specifications to XML or HTML

I chose Jarte as a Word Processor as it is one Free and two uses those dll's. Given time I might take them for a spin myself and put them through their abilities to better fit in with css demands. To hard code formatted text is one thing but to format on the fly is a completely different barrel of fish.

We agree on Open Office but then they had all MS Office foibles to guard against. Criticism is always easy after an event than before it occurs. I admit in part I use the brilliance of other peoples minds. In a way that is why I am here. I am a sponge, I soak up knowledge as best I can.

I will add again if I was to create a static page I would hard code it, I would try to keep the bloat to a minimum but horses for courses.
The Guy from OZ



Have something to add? We’d love to hear it!
You must have an account to participate. Please Sign In Here, then join the conversation.