|
Home | Switchboard | Unix Administration | Red Hat | TCP/IP Networks | Neoliberalism | Toxic Managers |
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and bastardization of classic Unix |
News | Recommended Links | Webmaster Toolset | Perl HTML Processors and Converters |
|
Since most documents in the world are getting converted to HTML format, and in many case they are tranformed form different format using some kind of automated tool, the HTML beautifier is immensely important.
FrontPage provides good HTML pretty printer. It is also available in its free version SharePoint Designer 2007
The other well established package with known (pretty high) quality is tidy. It is written is C and as such it is difficult to maintain, but it does its job well.
tidy is available as a package and can be installed in Cygwin, so it can be used in Windows too. Cygwin for some reason wants to install Gnome with it :-)
|
Switchboard | ||||
Latest | |||||
Past week | |||||
Past month |
|
Jan 01, 2019 | stackoverflow.com
A command-line HTML pretty-printer: Making messy HTML readable [closed] Ask Question Asked 10 years, 1 month ago Active 10 months ago Viewed 51k times
knorv ,
Closed. This question is off-topic . It is not currently accepting answers.jonjbar ,
Have a look at the HTML Tidy Project: http://www.html-tidy.org/The granddaddy of HTML tools, with support for modern standards.
There used to be a fork called tidy-html5 which since became the official thing. Here is its GitHub repository .
Tidy is a console application for Mac OS X, Linux, Windows, UNIX, and more. It corrects and cleans up HTML and XML documents by fixing markup errors and upgrading legacy code to modern standards.
For your needs, here is the command line to call Tidy:
tidy inputfile.htmlPaul Brit ,
Update 2018: Thehomebrew/dupes
is now deprecated, tidy-html5 may be directly installed.brew install tidy-html5
Original reply:
Tidy
from OS X doesn't supportHTML5
. But there is experimental branch onGithub
which does.To get it:
brew tap homebrew/dupes brew install tidy --HEAD brew untap homebrew/dupesThat's it! Have fun!
Boris , 2019-11-16 01:27:35
Error: No available formula with the name "tidy"
.brew install tidy-html5
works. – Pysis Apr 4 '17 at 13:34
Stack Overflow
Perl: HTML::PrettyPrinter - Handling self-closing tags
I am a newcomer to Perl (Strawberry Perl v5.12.3 on Windows 7), trying to write a script to aid me with a repetitive HTML formatting task. The files need to be hand-edited in future and I want them to be human-friendly, so after processing using the HTML package (HTML::TreeBuilder etc.), I am writing the result to a file using HTML::PrettyPrinter. All of this works well and the output from PrettyPrinter is very nice and human-readable. However, PrettyPrinter is not handling self-closing tags well; basically, it seems to be treat the slash as an HTML attribute. With input like:
<img />PrettyPrinter returns:
<img /="/" >Is there anything I can do to avoid this other than preprocessing with a regex to remove the backslash?
Not sure it will be helpful, but here is my setup for the pretty printing:
my $hpp = HTML::PrettyPrinter->new('linelength' => 120, 'quote_attr' => 1);
$hpp->allow_forced_nl(1);my $output = new FileHandle ">output.html";
if (defined $output) {
$hpp->select($output);
my $linearray_ref = $hpp->format($internal);
undef $output;
$hpp->select(undef),
}perl html-formatting
shareimprove this question
asked Jan 18 '12 at 15:18
SenatorForLife
305
add a comment
1 Answer
active oldest votes
up vote1
down vote
accepted
You can print formatted human readable html with TreeBuilder method:
$h = HTML::TreeBuilder->new_from_content($html);
print $h->as_HTML('',"\t");but if you still prefer this bugged prettyprinter try to remove problem tags, no idea why someone need ...
$h = HTML::TreeBuilder->new_from_content($html);
while(my $n = $h->look_down(_tag=>img,'src'=>undef)) { $n->delete }UPD:
well... then we can fix the PrettyPrinter. It's pure perl module so lets see... No idea where on windows perl modules are for me it's /usr/local/share/perl/5.10.1/HTML/PrettyPrinter.pm
maybe not an elegant solution, but will work i hope. this sub parse attribute/value pairs, a little fix and it will add single '/' at the end
~line 756 in PrettyPrinter.pm I've marked the stings that i added with ###<<<<<< at the end
#
# format the attributes
#
sub _attributes {
my ($self, $e) = @_;
my @result = (); # list of ATTR="value" strings to returnmy $self_closing = 0; ###<<<<<<
my @attrs = $e->all_external_attr(); # list (name0, val0, name1, val1, ...)while (@attrs) {
my ($a,$v) = (shift @attrs,shift @attrs); # get current name, value pair
if($a eq '/') { ###<<<<<<
$self_closing=1; ###<<<<<<
next; ###<<<<<<
} ###<<<<<<# string for output: 1. attribute name
my $s = $self->uppercase? "\U$a" : $a;.# value part, skip for boolean attributes if desired
unless ($a eq lc($v) &&
$self->min_bool_attr &&.
exists($HTML::Tagset::boolean_attr{$e->tag}) &&
(ref($HTML::Tagset::boolean_attr{$e->tag}).
? $HTML::Tagset::boolean_attr{$e->tag}{$a}.
: $HTML::Tagset::boolean_attr{$e->tag} eq $a)) {
my $q = '';
# quote value?
if ($self->quote_attr || $v =~ tr/a-zA-Z0-9.-//c) {
# use single quote if value contains double quotes but no single quotes
$q = ($v =~ tr/"// && $v !~ tr/'//) ? "'" : '"'; # catch emacs ");
}
# add value part
$s .= '='.$q.(encode_entities($v,$q.$self->entities)).$q;
}
# add string to resulting list
push @result, $s;
}push @result,'/' if $self_closing; ###<<<<<<
return @result; # return list ('attr="val"','attr="val"',...);
}
Thanks for the answer. However, neither of these is a solution. Even with the tab indention option you suggested, print $h->as_HTML still runs things together in odd ways that a human never would (for example, all of the h2's are run together on the same line with the preceding p tag). Hence the use of PrettyPrinter. I think you misunderstood my miminal example with regard to PrettyPrinter. There is nothing wrong with my img tags--PrettyPrinter prints all self-closing tags as standard tags with a / property set to "/", e.g. <br /> becomes <br /="/"> � SenatorForLife Jan 19 '12 at 15:05
i updated the post about how to fix the module. hope this will help � Dimanoid Jan 19 '12 at 19:20
This seems to work very nicely, thank you! I'm definitely too much of a Perl newbie to have figured out the hack on my own. For other Strawberry Perl users, here's where HTML::PrettyPrinter showed up on my machine after being installed with cpanm: C:\strawberry\perl\site\lib\HTML\PrettyPrinter.pm � SenatorForLife Jan 19 '12 at 20:20
Thanks. Your fix worked for me. � AJ Dhaliwal Jun 19 '12 at 7:05
Stack Overflow
I have some HTML output sitting in a variable that I will like to Prettify / Beautify but struggling to make sense out of the results of my web searches.
The example in the HTML::Tidy documentation shows it uses the source parameter as a variable (which is what was requested). � Thomas Dickey Mar 28 '15 at 10:55
HTML::Tidy does exactly that:
#!/usr/bin/perl
use strict;
use warnings;use HTML::Tidy;
my $str = '<div><p>Text<h2>Heading</h2>';
my $tidy = 'HTML::Tidy'->new;
print $tidy->clean($str);output
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
<meta name="generator" content="tidyp for Linux (v1.04), see www.w3.org">
<title></title>
</head>
<body>
<div>
<p>Text</p>
<h2>Heading</h2>
</div>
</body>
</html>Given that the HTML is not under your control, you might want to also consider HTML::PrettyPrinter if HTML::TreeBuilder generates the correct syntax tree for your HTML.
I've been working on this tool for a while. It's not quite done yet, but it's definitely at a useful stage. I call it the tabifier, but in truth it's a code beautifier.
The overarching design goal for this tool was to beautify HTML code without breaking it. This is of course not totally possible, but I've strived to get as close as possible. There are great HTML beautifiers out there like HTML Tidy but they generally do too much.
When I take an ugly HTML page that someone else has written and I don't know, I want to pass it through a beautifier to make it easier to work with. Things like tidy, though, will drastically alter the code, often making it more work to turn the result of the beautification back into something that displays like the original than it would be to just plain fix it.
htmlpp is a simple HTML pretty printer, based on nsgmls and SGMLS.pm. The code is pretty alpha, but gives attractive results for many HTML docs. Some things, like nested tables, are rendered only passably. Other deeply-nested structures may render badly as well.
Note that this pretty-printer is oldish, and alpha, and unlikely to be developed any further. It's not a bad illustration of some of the possibilities for SGML technology in web authoring. Perhaps someone will take up the challenge, and build the "right" tool!
Since htmlpp gets its input from nsgmls, invalid documents should not be expected to work. However, a side effect of this approach is that minor errors and inconsistencies are actually fixed. Attribute values are always quoted in the pretty printed version. Characters like "<", ">" and "&" are converted into the appropriate SGML entities in attribute values and in document text. End tags are inserted automatically -- which will surprise you if you thought it was legal to imbed <pre> elements inside <p> elements, for example.
use LWP::Simple;
use HTML::Parse;
use HTML::Entities;
use Text::Wrap;
use Getopt::Long;
It also works great on the atrociously hard to read markup generated by specialized HTML editors and conversion tools, and can help you identify where you need to pay further attention on making your pages more accessible to people with disabilities.
http://www.domtools.com/pub/hindent1.1.0.tar.gz (12 hits) | |
Homepage: | http://www.domtools.com/unix/hindent.shtml (34 hits) |
Changelog: | http://www.domtools.com/pub/hindent1.1.0-changes.txt |
(Perl) Formats and indents HTML code and writes a new file with the results.
Pretty HTML is an easy-to-use program that formats your HTML Web pages. After processing, your HTML code is neatly arranged, commented, spaced, and indented, making it much easier to read and maintain. You can also use Pretty HTML to compress your Web pages by eliminating unnecessary spaces and carriage returns. Process your Web pages one at a time or batch-format entire folders in a single operation. Pretty HTML offers a number of options to ensure that the HTML formatting is done to your liking. To play it extra safe, you can have the program make backup copies of your originals. Excellent online help is included.
Google matched content |
Validators
Society
Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy
Quotes
War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes
Bulletin:
Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law
History:
Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history
Classic books:
The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater�s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite
Most popular humor pages:
Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor
The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D
Copyright � 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...
|
You can use PayPal to to buy a cup of coffee for authors of this site |
Disclaimer:
The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.
Last modified: March 05, 2020