Commit Graph

20 Commits

Author SHA1 Message Date
Török Edvin
d5a5fef965 * libclamav/htmlnorm.c: generate only nocomment.html (always contains script too) and notags.html (bb #851)
* libclamav/hashtab.h:  len and data were reversed, invalidating entitylist.h
  * libclamav/filetypes_int.h: improve HTML filetype detection (bb #853)


git-svn: trunk@3660
2008-02-20 15:49:43 +00:00
Török Edvin
ec774193d3 SVN r3619 broke phishing detection, fixed it
git-svn: trunk@3625
2008-02-13 10:24:51 +00:00
Török Edvin
0664128a61 tagless version of HTML file (bb #162)
fix compiler warning


git-svn: trunk@3619
2008-02-11 21:41:58 +00:00
Török Edvin
b3fc7f9747 use entconv to detect UTF-16BE, and UCS-4 variants
use only cli_readline() we don't need exact conversion
drop unused functions,
simplify encoding_norm_readline(), and rename to encoding_normalize_toascii()


git-svn: trunk@3571
2008-02-01 19:38:52 +00:00
Török Edvin
a6de01aa14 handle NULL characters in HTML files. (bb #539).
git-svn: trunk@3543
2008-01-25 16:39:40 +00:00
Török Edvin
8b22c9b52a optimize char reference handling
git-svn: trunk@3532
2008-01-23 15:43:32 +00:00
Török Edvin
b0b8398b48 * contrib/entitynorm:
* use fewer entities, browsers don't support all either.
		       	* update to generate code for new entconv.
		       	* no need for configure, use just a simple Makefile
			 (it is an internal tool)
  libclamav/entconv.c, hashtab.c, htmlnorm.c:
			* don't allocate memory for each entity_norm call.
			* don't touch length of mmaped area (bb #785)
			* update htmlnorm to use new entity_norm


git-svn: trunk@3515
2008-01-21 15:52:21 +00:00
Török Edvin
4e1127c594 AC_TRY_LINK already adds a main(), remove duplicate main()
entconv improvements to improve security and performance
	Part I for  (bb #686, #386)
	TODO:
	* optimize entity_norm
	* create testfiles for unicode encoding variants
	* create a regression test
	* check for memory leaks


git-svn: trunk@3511
2008-01-20 22:18:14 +00:00
Tomasz Kojm
0808081e13 properly truncate long URLs (Edwin, bb#645)
git-svn: trunk@3372
2007-12-06 14:53:22 +00:00
Tomasz Kojm
45d6cbd9a8 fix possible NULL dereference (bb#582)
git-svn: trunk@3185
2007-08-21 20:30:15 +00:00
Tomasz Kojm
1c6fa20917 fix possible NULL dereference (bb#582)
git-svn: trunk@3184
2007-08-21 20:27:40 +00:00
Török Edvin
736112931b handle & in URLs, even with
entity-converter off; don't leave & in URLs (bb
  #535)


git-svn: trunk@3100
2007-06-16 17:03:42 +00:00
Tomasz Kojm
84fd5a614c fix some possible error path leaks by changing cli_realloc() to cli_realloc2()
git-svn: trunk@3064
2007-05-25 23:10:58 +00:00
Török Edvin
5e2a487ca8 fix uninitialized value warning
git-svn: trunk@3047
2007-05-01 20:13:27 +00:00
Török Edvin
462e8e5eb3 apply next set of patches for enabling phishing code
git-svn: trunk@3043
2007-05-01 16:46:52 +00:00
Török Edvin
f74bc8271b Update code to use new AC matcher.
Fix URL truncation.


git-svn: trunk@3039
2007-04-28 20:15:22 +00:00
Török Edvin
e4ba6d85cc leave <0x20 characters untouched in cl_experimental (don't normalize them to &xx;)
git-svn: trunk@2942
2007-03-14 19:21:52 +00:00
Török Edvin
66f7a69148 ampersands were missed in URLs.(bb #377).
git-svn: trunk@2905
2007-03-05 19:31:17 +00:00
Török Edvin
6b53b2341d Better handling of empty charset in meta tag.
git-svn: trunk@2901
2007-03-03 23:00:14 +00:00
Sven Strickroth
a99111f050 remove old CVS-stuff and make the repository look more like SVN
git-svn: trunk@2755
2007-02-17 19:02:20 +00:00