Commit Graph

169 Commits

Author SHA1 Message Date
Vytautas Šaltenis
0b69796248 Go style: more Html -> HTML renames 2016-04-01 15:37:21 +03:00
Vytautas Šaltenis
02a5ce37ff Go style: Html{Block|Span} -> HTML{Block|Span} 2016-04-01 13:15:47 +03:00
Vytautas Šaltenis
32802dbae5 Go style: rename Toc to TOC 2016-04-01 13:12:38 +03:00
Vytautas Šaltenis
f1361aa0da Make ForEachNode func a Walk method on Node 2016-04-01 12:36:56 +03:00
Vytautas Šaltenis
4ba991937b Store cell alignment in own type instead of int 2016-04-01 11:44:59 +03:00
Vytautas Šaltenis
60026cc3c6 Make ListData a nested struct instead of pointer 2016-04-01 11:21:25 +03:00
Vytautas Šaltenis
2a07386455 Rename HtmlFlags to HTMLFlags to adhere to Go style 2016-04-01 10:49:23 +03:00
Vytautas Šaltenis
71fe9a191e Remove dead code 2016-04-01 10:48:25 +03:00
Vytautas Šaltenis
a55b2615a4 Run Smartypants as a separate pass over the AST
Separate Smartypants somewhat from the HTML renderer. Move its flags
from HtmlFlags to Extensions (probably should be moved to its own set of
flags, but not now). With that done, do a separate walk of the tree and
either run Smartypants processor if it's enabled, or simply escape text
nodes.
2016-04-01 10:44:22 +03:00
Vytautas Šaltenis
7869a127bd Combine two Smartypants structs into one
Combine smartypantsRenderer and smartypantsData into one struct. Make
action funcs methods on that struct.
2016-03-31 21:40:37 +03:00
Vytautas Šaltenis
4a7ff562a7 Rename Html to HTML to adhere to Go style 2016-03-31 13:54:09 +03:00
Vytautas Šaltenis
fd2d69de5e Make renderer write to an explicit io.Writer 2016-03-30 21:13:02 +03:00
Vytautas Šaltenis
dc7d4b68df Remove some cruft 2016-03-30 15:56:53 +03:00
Vytautas Šaltenis
0382dab0c3 The single node renderer is a separate func now
A default HTML renderer for a single node is now easily accessible.
Makes it easy to fall back to the default behavior when writing custom
HTML renderers.
2016-03-30 15:48:43 +03:00
Vytautas Šaltenis
886a1405c0 Extract local funcs/vars into methods/members 2016-03-30 15:37:03 +03:00
Vytautas Šaltenis
94893247d1 Add a new renderer from AST
This is the new renderer that walks AST and renders everything to a
buffer. Completely covers all the functionality of the previous renderer
and will likely replace it.
2016-03-30 12:54:12 +03:00
Vytautas Šaltenis
7846a310ea Remove unused code 2016-03-30 12:54:12 +03:00
Vytautas Šaltenis
d1b544e278 HACK: render TOC the old way, backup and truncate output 2015-11-10 21:36:32 +02:00
Vytautas Šaltenis
97235182ac Enable writing plain text straight to output
It's only used in a single place and should probably be refactored away,
but this workaround is OK for now.
2015-11-10 21:36:32 +02:00
Vytautas Šaltenis
7a97ffe689 Remove almost all uses of 'out' in HTML renderer 2015-11-10 21:36:32 +02:00
Vytautas Šaltenis
08233481ed Fix Begin/EndHeader to use the new 'out'-less interface
Remove the 'out' parameter. Also, instead of returning and passing the
position of TOC, use CopyWrites to capture contents of the header and
pass that captured buffer instead.
2015-11-10 21:36:32 +02:00
Vytautas Šaltenis
dce6df90b9 Add infrastructure to collect output in a buffer
Add a structure to collect output in a buffer (replaces what used to be
the 'out' parameter all over the place).

Notable things about this struct are the captureBuff and copyBuff
buffers. They're intended to redirect all the output (captureBuff) or
make a copy of all the output (copyBuff) while they're set to non-nil.
Here's an example of their intended use:

    // what used to be a temp buffer as an 'out' parameter
    //     var cellWork bytes.Buffer
    //     p.inline(&cellWork, data[cellStart:cellEnd])
    // can now be captured like this:
    cellWork := p.r.CaptureWrites(func() {
           p.inline(data[cellStart:cellEnd])
    })
2015-11-10 21:36:32 +02:00
Vytautas Šaltenis
352ffdefa4 Remove a bunch of 'out' parameters from calls, WIP
Still not all of them, still broken.
2015-11-10 21:36:32 +02:00
Vytautas Šaltenis
6e42506fcc Remove 'out' parameter from renderer interface
This only removes the parameter from func declarations, not from their
bodies, so obviously breaks everything. Will be restored in upcoming
commits.
2015-11-10 21:36:31 +02:00
Vytautas Šaltenis
29f02f7d01 Rename Renderer method receivers
From 'options' to 'r'. This change contains only a massive rename, no
other changes.
2015-11-10 21:08:32 +02:00
Vytautas Šaltenis
bc4735b84d Remove callback from Footnotes renderer event
Split Footnotes into two events: BeginFootnotes and EndFootnotes,
removing the need for callback.
2015-11-10 21:08:32 +02:00
Vytautas Šaltenis
6d6be3d2b2 Remove callback from Paragraph renderer event
Split Paragraph into two events: BeginParagraph and EndParagraph,
removing the need for callback.
2015-11-10 21:08:32 +02:00
Vytautas Šaltenis
af1b26fa04 Remove callback from List renderer event
Split List into two events: BeginList and EndList, removing the need for
callback.
2015-11-10 21:08:32 +02:00
Vytautas Šaltenis
82be6cab6d Remove callback from Header renderer event
Split Header into two events: BeginHeader and EndHeader, removing the
need for callback.
2015-11-10 21:08:32 +02:00
Vytautas Šaltenis
b16c9b3787 Simplify callbacks in Renderer interface
The callbacks used to return bools, but none of the actual
implementations return false, always true. So in order to make further
refactorings simpler, make the interface reflect the inner workings: no
more return values, no more conditionals.
2015-11-10 21:08:32 +02:00
Vytautas Šaltenis
ee98bc0bf4 Massive replacement of C_STYLE flags to typed ones 2015-11-10 21:08:32 +02:00
Vytautas Šaltenis
06515e9125 Rename public constants to idiomatic Go 2015-11-10 20:27:34 +02:00
Anthony Fok
38cc6e9ae8 Add HTML_SMARTYPANTS_DASHES for toggling smart dashes 2015-08-03 23:57:26 -06:00
Vincent Batoufflet
c4825a719d Add definition lists extension support 2015-06-03 08:03:34 +02:00
Dmitri Shuralyov
18186eea26 Do not emit newline after <img> tag.
This changes HTML renderer not to always add a newline character after
<img> tags. This is desirable because <img> tags can be inlined, and
sometimes you want to avoid whitespace on left and right sides. Previous
behavior of always adding a newline would unavoidably create whitespace
after <img> tag.

Update all tests to match new behavior. There are few changes, and
they're completely isolated to inline image tests.

Fixes #169.
2015-05-25 12:59:05 -07:00
Vytautas Šaltenis
c6be4fadb1 Merge pull request #161 from rtfb/issue-146
Issue 146
2015-05-06 15:30:31 +03:00
Vytautas Šaltenis
a2702e7449 Simplify isRelativeLink() a bit 2015-04-11 18:06:30 +03:00
Vytautas Šaltenis
b3137e7c8f Merge pull request #152 from elian0211/about_links
update about links
2015-04-09 20:41:45 +03:00
Vytautas Šaltenis
f4655604b3 Cleanup a random bunch of repetitive loops
Replace them with helper function calls.
2015-04-07 21:59:42 +03:00
Beyang Liu
60b0b4024f add rel="noreferrer" option 2015-03-14 16:46:32 -07:00
elian0211
27ba4cebef update about links
when link to current directory or parent directory
2015-02-20 17:06:55 +08:00
Dmitri Shuralyov
f4bb968b5f Minor cleanup.
Apply gofmt on html.go.
Apply goimports-compatible formatting on block.go (space between standard and third party imports).
Move Travis build status image in a more pleasing, common location.
Remove "Markdown pretty-printer output engine" from TODO steps; this is already done in markdownfmt.
Remove unneeded trailing whitespace in README.
2014-11-29 20:41:11 -08:00
Vytautas Šaltenis
315f87d8c0 Merge pull request #128 from bjornerik/angled-quotes
Add support for angled, double quotes
2014-11-28 19:33:07 +02:00
Austin Ziegler
9c061de92b Allow configurable header ID prefix/suffixes.
This is specifically driven by the Hugo usecase where multiple documents
are often rendered into the same ultimate HTML page.

When a header ID is written to the output HTML format (either through
`HTML_TOC`, `EXTENSION_HEADER_IDS`, or `EXTENSION_AUTO_HEADER_IDS`), it
is possible that multiple documents will hvae identical header IDs. To
permit validation to pass, it is useful to have a per-document prefix or
suffix (in our case, an MD5 of the content filename, and we will be
using it as a suffix).

That is, two documents (`A` and `B`) that have the same header ID (`#
Reason {#reason}`), will end up having an actual header ID of the form
`#reason-DOCID` (e.g., `#reason-A`, `#reason-B`) with these HTML
parameters.

This is built on top of #126 (more intelligent collision detection for
`EXTENSION_AUTO_HEADER_IDS`).
2014-11-23 20:37:27 -05:00
Austin Ziegler
40f28ee022 Prevent generated header collisions, less naively.
> This is a rework of an earlier version of this code.

The automatic header ID generation code submitted in #125 has a subtle
bug where it will use the same ID for multiple headers with identical
text. In the case below, all the headers are rendered a `<h1
id="header">Header</h1>`.

  ```markdown
  # Header
  # Header
  # Header
  # Header
  ```

This change is a simple but robust approach that uses an incrementing
counter and pre-checking to prevent header collision. (The above would
be rendered as `header`, `header-1`, `header-2`, and `header-3`.) In
more complex cases, it will append a new counter suffix (`-1`), like so:

  ```markdown
  # Header
  # Header 1
  # Header
  # Header
  ```

This will generate `header`, `header-1`, `header-1-1`, and `header-1-2`.

This code has two additional changes over the prior version:

1.  Rather than reimplementing @shurcooL’s anchor sanitization code, I
    have imported it as from
    `github.com/shurcooL/go/github_flavored_markdown/sanitized_anchor_name`.

2.  The markdown block parser is now only interested in *generating* a
    sanitized anchor name, not with ensuring its uniqueness. That code
    has been moved to the HTML renderer. This means that if the HTML
    renderer is modified to identify all unique headers prior to
    rendering, the hackish nature of the collision detection can be
    eliminated.
2014-11-23 20:35:43 -05:00
bep
857a1a0260 Add support for angled, double quotes
The flag `HTML_SMARTYPANTS_ANGLED_QUOTES` combined with `HTML_USE_SMARTYPANTS` configures rendering of double quotes as angled left and right quotes (&laquo; &raquo;).

The SmartyPants documentation mentions a special syntax for these, `<<>>`, a syntax neither pretty nor user friendly.

Typical use cases would be either or, or combined, but never in the same document. As an example would be a person from Norway; he has a blog in both English and Norwegian (his native tounge); he would then configure Blackfriday to use angled quotes for the Norwegian section, but keep them as reqular double quotes for the English.

If the flag `HTML_SMARTYPANTS_ANGLED_QUOTES` is not provided, everything works as before this commit.
2014-11-05 23:29:41 +01:00
Austin Ziegler
8cc40f8e07 Use supplied header ID for TOC rendering.
- Fixes #112 so that `#header {#header-id}` renders the TOC with
  `#header-id` instead of `#toc_1`.
2014-10-27 16:49:28 -04:00
Vytautas Saltenis
cf6bfc9d6d Rip off all blackfriday's html sanitization effort
As per discussion in issue #90.
2014-09-19 21:25:23 +03:00
tummychow
67002b01b6 Use HTML5 recommended style of language on code blocks
For code blocks that contain a certain language of code, the recommended
attribute structure is <pre><code class="language-foo">. This also
corresponds to the behavior expected by various JS syntax highlighters.

The GitHub code block implementation was obsolete, and identical to the
normal implementation except for its attribute structure, so it was
removed.

Closes #108.
2014-08-28 18:01:06 -04:00
Brian Goff
539b27a624 Add titleblock support 2014-08-04 14:08:22 -04:00
Daniel Imfeld
5bf00efe39 Remove unnecessary HTML_ABSOLUTE_LINKS flag 2014-05-29 09:17:20 -05:00
Daniel Imfeld
10f1dc6358 Fix spelling error 2014-05-28 23:52:45 -05:00
Daniel Imfeld
628c02d37b Move footnote prefix to a better place 2014-05-24 14:28:37 -05:00
Daniel Imfeld
c7f4b178c2 Use parameters object for extra options. Enhance footnote support.
Option to add return links.
Option to make footnote prefixes unique, for rendering multiple
documents per page.
2014-05-24 13:29:39 -05:00
Daniel Imfeld
ec41294bc4 Add footnote prefix option. Needs testing 2014-05-24 02:55:13 -05:00
Daniel Imfeld
5c12499aa1 Add ability to convert relative links to absolute 2014-05-18 01:28:15 -05:00
Vytautas Šaltenis
3dba5bc56e Merge branch 'master' of github.com:gihnius/blackfriday into gihnius-master
Conflicts:
	html.go
	inline_test.go
2014-05-01 21:43:42 +03:00
Martin Probst
41251715ad Use go.net/html's parser to sanitize HTML.
Use an HTML5 compliant parser that interprets HTML as a browser would to parse
the Markdown result and then sanitize based on the result.
Escape unrecognized and disallowed HTML in the result.
Currently works with a hard coded whitelist of safe HTML tags and attributes.
2014-04-27 23:40:44 +02:00
willnix
be9cbc634a tagWhitelist allows alignment attribute now
This is the closest I could get to removing everything "unsave" without introducing an additional regex.
2014-04-19 21:59:04 +00:00
willnix
c1e4996787 Add table tags to the whitelist.
Fixing:
55cd82008e

This commit introduced a html tag whitelist which does not include any table tags (<td>,<tr>,<thead>...). Therefore even tables the markdown parser itself generated will be removed.
2014-04-17 15:44:40 +00:00
Vytautas Šaltenis
c5ece173ad Merge pull request #59 from johnsto/master
Header ID specifiers
2014-04-11 21:31:27 +03:00
Dave Johnston
2dff0864f0 Add header ID support and tests: # Header {#myid} 2014-04-05 20:42:58 +01:00
Kjetil Mehl
786aed6213 Explicit return byte array at end of function. 2014-04-05 16:59:28 +02:00
Vytautas Šaltenis
55bb56bf9b Merge pull request #55 from rtfb/master
Autolink fixes
2014-03-30 19:58:39 +03:00
Vytautas Šaltenis
d643453f1e Merge pull request #50 from rtfb/master
Better protection against JavaScript injection
2014-03-30 19:52:13 +03:00
gihnius
93484b1424 add nofollow ref for non internal links only 2014-03-21 11:14:58 +08:00
gihnius
ecf59d4a55 add target blank attr 2014-03-21 10:52:46 +08:00
Graham Miller
d71c759108 add HTML_NOFOLLOW_LINKS 2014-02-25 09:21:57 -05:00
Vytautas Šaltenis
b0bdfbec4c Fix bug in autolink overescaping html entities
If autolink encounters a link which already has an escaped html entity,
it would escape the ampersand again, producing things like these:
    &amp;  --> &amp;amp;
    &quot; --> &amp;quot;
This commit solves that by first looking for all entity-looking things
in the link and copying those ranges verbatim, only considering the rest
of the string for escaping.
Doesn't seem to have considerable performance impact.
The mailto: links are processed the old way.
2014-02-17 21:09:04 +02:00
Vytautas Šaltenis
cc0d56d092 Extract a chain of ifs into separate func
This gives a ~10% slowdown of a full test run, which is tolerable.
Switch statement is still slightly slower (~5%). Using map turned out to
be unacceptably slow (~3x slowdown).
2014-02-17 21:09:04 +02:00
Vytautas Šaltenis
31a96c6ce7 go fmt 2014-02-17 21:09:03 +02:00
Vytautas Šaltenis
2f50a53f8e Rename HTML_SKIP_SCRIPT to HTML_SANITIZE_OUTPUT 2014-01-22 01:23:43 +02:00
Vytautas Šaltenis
55cd82008e Rewrite protection against JavaScript injection
This drops the naive approach at <script> tag stripping and resorts to
full sanitization of html. The general idea (and the regexps) is grabbed
from Stack Exchange's PageDown JavaScript Markdown processor[1]. Like in
PageDown, it's implemented as a separate pass over resulting html.

Includes a metric ton (but not all) of test cases from here[2]. Several
are commented out since they don't pass yet.

Stronger (but still incomplete) fix for #11.

[1] http://code.google.com/p/pagedown/wiki/PageDown
[2] https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet
2014-01-22 01:14:35 +02:00
Vytautas Šaltenis
e02c392dc6 Extract useful code to separate func 2014-01-22 00:45:43 +02:00
David Kitchen
6e6572e913 Added th to table headers so that styling with things like Twitter Bootstrap and typeset.css work as expected. Cells in headers should always be TH unless they are advisory cells within headers in which case TD is acceptable (but being Markdown a user with such needs could just enter HTML for this) 2013-10-16 11:36:33 +01:00
moshee
c23099e5ee Implementation and some tests for inline footnotes. Also I noticed the list items had the wrong ids, that was silly of me. 2013-07-01 01:37:52 +00:00
moshee
7bdb82c53a new tests pass but old tests now fail... 2013-06-26 15:57:51 +00:00
moshee
be082a1ef2 First attempt at supporting Pandoc-style footnotes. The existing tests have not broken but the new functionality does not work yet. 2013-06-25 01:18:47 +00:00
Vytautas Šaltenis
8226238289 Improve html element stripping code 2013-04-18 03:15:47 +03:00
Vytautas Šaltenis
dcaaa9b5dc More <script> stripping
Partially addresses issue #11.
2013-04-13 23:24:30 +03:00
Vytautas Šaltenis
fb923cdb78 Add an option to strip <script> elements
Partially addresses issue #11.
2013-04-13 22:57:16 +03:00
Vytautas Šaltenis
b79e720a36 Make isHtmlTag() case insensitive 2013-04-13 22:34:37 +03:00
Vytautas Šaltenis
a2fda5e98f Extract repetitive code to a func 2013-04-13 22:26:29 +03:00
Vytautas Šaltenis
d5a8df164b Fix bug in isHtmlTag()
Fix what seems to be a typo. j should iterate through all tagname, so it
should be initialized to zero. The test exposes this bug.
2013-04-13 22:21:47 +03:00
Caleb Spare
a25d9a543f Fix html tag ordering in doc string. 2012-11-22 12:52:56 -08:00
Caleb Spare
d0d854958e Fix up method documentation formatting. 2012-11-22 12:12:08 -08:00
moshee
8a86b6d6be HTML5 doctype, Wrap TOC with <nav>
<nav> makes the TOC more easily identifiable and workable with CSS.
2012-10-21 21:23:44 -07:00
Russ Ross
a5441fd99f updates for go 1 2012-03-07 21:36:31 -07:00
Russ Ross
530123dd9f additional doc comments 2011-07-07 12:05:29 -06:00
Russ Ross
bb8ee591d1 doc improvements, commenting 2011-07-07 11:56:45 -06:00
Russ Ross
bd60e3691b removing more redundant checks, additional cleanup of block parsing 2011-07-01 14:13:26 -06:00
Russ Ross
689f6cb79b more consistent spacing of block-level elements 2011-07-01 11:19:42 -06:00
Russ Ross
ae9562f685 move whitespace stripping to parser, not renderers 2011-06-29 15:38:35 -06:00
Russ Ross
d3c8225096 corner case spacing issue with table of contents 2011-06-29 13:24:15 -06:00
Russ Ross
2aca667078 simplify inline callback interface 2011-06-29 13:00:54 -06:00
Russ Ross
3c6f18afc7 Renderer is now an interface 2011-06-29 11:13:17 -06:00
Russ Ross
793fee5451 preparing for switch to rendering interface 2011-06-29 10:43:10 -06:00
Russ Ross
55697351d0 table of contents support beefed up 2011-06-29 10:36:56 -06:00
Russ Ross
873a60ad49 complete page rendering is now an option in the library 2011-06-29 10:08:56 -06:00
Russ Ross
b1a0318250 refactoring: inline renderers return bools, preparing rendering struct to become an interface 2011-06-28 19:46:35 -06:00