Initial commit

2021-05-15 15:52:00 +02:00
commit 44fed4e167
95 changed files with 53173 additions and 0 deletions
--- a/webalizer-2.23-08/DNS.README
+++ b/webalizer-2.23-08/DNS.README
@@ -0,0 +1,295 @@
+The Webalizer - A log file analysis program  -- DNS information
+
+The webalizer has the ability to perform reverse DNS lookups,  and
+fully supports both IPv4 and IPv6 addressing schemes.  This document
+attempts to explain how it works, and some things that you should be
+aware of when using the DNS lookup features.
+
+Note: The Reverse DNS feature may be enabled or disabled at compile
+      time.  DNS lookup code is enabled by default.  You can run The
+      Webalizer using the '-vV' command line options to determine what
+      options are enabled in the version you are using.
+
+
+How it works
+------------
+
+DNS lookups are made against a DNS cache file containing IP addresses
+and resolved names.  If the IP address is not found in the cache file,
+it will be left as an IP address.  In order for this to happen, a
+cache file MUST be specified when the Webalizer is run, either using
+the '-D' command line switch, or a "DNSCache" configuration file
+keyword.  If no cache file is specified, no attempts to perform DNS
+lookups will be done. The cache file can be made three different ways.
+
+1) You can have the Webalizer pre-process the specified log file at
+   run-time, creating the cache file before processing the log file
+   normally.  This is done by setting the number of DNS Children
+   processes to run, either by using the '-N' command line switch or
+   the "DNSChildren" configuration keyword.  This will cause the
+   Webalizer to spawn the specified number of processes which will
+   be used to do reverse DNS lookups.. generally, a larger number
+   of processes will result in faster resolution of the log, however
+   if set too high may cause overall system degradation.  A setting
+   of between 5 and 20 should be acceptable, and there is a maximum
+   limit of 100.   If used, a cache filename MUST be specified also,
+   using either the '-D' command line switch, or the "DNSCache"
+   configuration keyword.  Using this method, normal processing will
+   continue only after all IP addresses have been processed, and the
+   cache file is created/updated.
+
+2) You can pre-process the log file as a standalone process, creating
+   the cache file that will be used later by the Webalizer.  This is
+   done by running the Webalizer with a name of 'webazolver' (ie: the
+   name 'webazolver' is a symbolic link to 'webalizer') and specifying
+   the cache filename (either with '-D' or DNSCache).   If the number
+   of child processes is not given, the default of 5 will be used. In
+   this mode, the log will be read and processed, creating a DNS cache
+   file or updating an existing one, and the program will then exit
+   without any further processing.
+
+3) You can use The Webalizer (DNS) Cache file Manager program 'wcmgr'
+   to create and manipulate a cache file.  A blank cache file can be
+   created which would be later populated, or data for the cache file
+   can be imported using tab delimited text files.  See the wcmgr(1)
+   man page for usage information.
+
+
+Run-time DNS cache file creation/update
+---------------------------------------
+
+The creation/update of a DNS cache file at run-time occurs as follows:
+
+1) The log file is read, creating a list of all IP addresses that are
+   not already cached (or cached but expired) and need to be resolved.
+   Addresses are expired based on the TTL value specified using the
+   'CacheTTL' configuration option or after 7 days (default) if no TTL
+   is specified.
+
+2) The specified number of children processes are forked, and are used
+   to perform DNS lookups.
+
+3) Each IP address is given, one at a time, to the next available child
+   process until all IP addresses have been processed.  Each child will
+   update the cache file when a result is returned.  This may be either
+   a resolved name or a failed lookup, in which case the address will be
+   left unresolved.  Unresolved addresses are not normally cached, but
+   can be, if enabled using the 'CacheIPs' configuration file keyword.
+
+4) Once all IP addresses have been processed and the cache file updated,
+   the Webalizer will process the log normally.  Each record it finds
+   that has an unresolved IP address will be looked up in the cache file
+   to see if a hostname is available (ie: was previously found).
+
+Because there may be a significant amount of time between the initial
+unresolved IP list and normal processing, the Webalizer should not be
+run against live log files (ie: a log file that is actively being written
+to by a server), otherwise there may be additional records present that
+were not resolved.
+
+
+Stand-Alone DNS cache file creation/update
+------------------------------------------
+
+The creation/update of the DNS cache file, when run in stand-alone mode,
+occurs as follows:
+
+1) The log file is read, creating a list of all IP addresses that are
+   not already cached (or cached but expired) and need to be resolved.
+
+2) The specified number of children processes are forked, and are used
+   to perform DNS lookups. If the number of processes was not specified,
+   the default of 5 will be used.
+
+3) Each IP address is given, one at a time, to the next available child
+   process until all IP addresses have been processed.  Each child will
+   update the cache file when a result is returned.
+
+4) Once all IP addresses have been processed and the cache file updated,
+   the program will terminate without any further processing.
+
+
+Larger sites may prefer to use a stand-alone process to create the DNS
+cache file, and then run the Webalizer against the cache file.  This
+allows a single cache file to be used for many virtual hosts, and reduces
+the processing needed if many sites are being processed.  The Webalizer
+can be used in stand alone mode by running it as 'webazolver'.  When
+run in this fashion, it will only create the cache file and then exit
+without any further processing.  A cache filename MUST be specified,
+however unlike when running the Webalizer normally, the number of child
+processes does not have to be given (will default to 5).  All normal
+configuration and command line options are recognized, however, many
+of them will simply be ignored.. this allows the use of a standard
+configuration file for both normal use and stand alone use.
+
+
+Examples:
+---------
+
+webalizer -c test.conf -N 10 -D dns_cache.db /var/log/my_www_log
+
+   This will use the configuration file 'test.conf' to obtain normal
+   configuration options such as hostname and output directory.. it
+   will then either create or update the file 'dns_cache.db' in the
+   default output directory (using 10 child processes) based on the
+   IP addresses it finds in the log /var/lib/my_www_log, and then
+   process that log file normally.
+
+
+webalizer -o out -D dns_cache.db /var/log/my_www_log
+
+  This will process the log file /var/log/my_www_log, resolving IP
+  addresses from the cache file 'dns_cache.db' found in the default
+  output directory "out".  The cache file must be present as it will
+  not be created with this command.
+
+
+for i in /var/log/*/access_log; do
+  webazolver -N 20 -D /var/lib/dns_cache.db $i
+done
+
+  The above is an example of how to run through multiple log files
+  creating a single DNS cache file.. this might be typically used on
+  a larger site that has many virtual hosts, all keeping their log
+  files in a separate directory.  It will process each access_log it
+  finds in /var/log/* and create a cache file (var/lib/dns_cache.db).
+  This cache file can then be used to process the logs normally with
+  with the Webalizer in a read-only fashion (see next example).
+
+
+for i in /etc/webalizer/*.conf; do webalizer -c $i -D /etc/cache.db; done
+
+  This will process each configuration file found in /etc/webalizer,
+  using the DNS cache file /etc/cache.db.  This will also typically be
+  used on a larger site with multiple hosts..  Each configuration file
+  will specify a site specific log file, hostname, output directory, etc.
+  The cache file used will typically be created using a command similar
+  to the one previous to this example.
+
+
+Cache File Maintenance
+----------------------
+
+The Webalizer DNS cache files generally require very little or no
+special attention.  There are times though when some maintenance
+is required, such as occasional purging of very old cache entries.
+The Webalizer never removes a record once it's inserted into the
+cache.  If a record expires based on its timestamp, the next time
+that address is seen in a log, its name is looked up again and the
+timestamp is updated.  However, there will always be addresses that
+are never seen again, which will cause the cache files to continue
+to grow in size over time.  On extremely busy sites or sites that
+attract many one time visitors, the cache file may grow extremely
+large, yet only contain a small amount of valid entries.  Using
+The Webalizer (DNS) Cache file Manager ('wcmgr'), cache files can
+be purged, removing expired entries and shrinking the file size.
+A TTL (time to live) value can be specified, so the length of time
+an entry remains in the cache can be varied depending on individual
+site requirements.  In addition to purging cache files, 'wcmgr' can
+also be used to list cache file contents, import/export cache data,
+lookup/add/delete individual entries and gather overall statistics
+regarding the cache file (number of records, number expired, etc..).
+
+To purge a cache file using 'wcmgr', an example command would be:
+
+wcmgr -p31 /path/to/dns.cache
+
+This would purge the 'dns.cache' cache file of any records that are
+over 31 days old, and would reclaim the space that those records
+were using in the file.  If you would like to see the records that
+get purged, adding the command line option '-v' (verbose) will cause
+the program to print each entry and its age as they are removed.
+You can also use the 'wcmgr' to display statistics on cache files
+to aid in determining when a cache file should be purged.  See the
+'wcmgr' man page (wcmgr.1) for additional information on the various
+options available.
+
+
+Stupid Cache Tricks
+-------------------
+
+The DNS cache files used by The Webalizer allow for efficient IP address
+to name translations.  Resolved names are normally generated by using an
+existing DNS name server to query the address, either locally or over
+the Internet.  However, using The Webalizer (DNS) Cache file Manager,
+almost any IP address to Name translation can be included in the cache.
+One such example would be for mapping local network addresses to real
+names, even though those addresses may not have real DNS entries on the
+network (or may be 'local' addresses prohibited from use on the Internet).
+A simple tab delimited text file can be created and imported into a cache
+for use by The Webalizer,  which will then be used to convert the local
+IP addresses to real names.  Additional configuration options for The
+Webalizer can then be used as would be normally.  For example, consider
+a small business with 10 computers and a DSL router to the Internet.
+Each machine on the local network would use a private IP address that
+would not be resolved using an external (public) DNS server, so would
+always be reported by The Webalizer as 'unknown/unresolved'.  A simple
+cache file could be created to map those unresolved addresses into more
+meaningful names, which could then be further processed by the Webalizer.
+An example might look something like:
+
+# Local machines
+192.168.123.254	0	0	gw.widgetsareus.lan
+192.168.123.253	0	0	mail.widgetsareus.lan
+192.168.123.250	0	0	sales.widgetsareus.lan
+192.168.123.240	0	0	service.widgetsareus.lan
+192.168.123.237	0	0	mgr.widgetsareus.lan
+192.168.123.235	0	0	support1.widgetsareus.lan
+192.168.123.234	0	0	support2.widgetsareus.lan
+192.168.123.232	0	0	pres.widgetsareus.lan
+192.168.123.230	0	0	vp.widgetsareus.lan
+192.168.123.225	0	0	reception.widgetsareus.lan
+192.168.123.224	0	0	finance.widgetsareus.lan
+127.0.0.1	0	1	127.0.0.1
+
+
+There are a couple of things here that should be noted.  The first
+is that the timestamps (first zero on each line above) are set to
+zero.  This tells The Webalizer that these cached entries are to
+be considered 'permanent', and should never be expired (infinite
+TTL or time to live).  The second thing to note is that the resolved
+names are using a non-standard TLD (top level domain) of '.lan'.
+The Webalizer will map this special TLD to mean "Local Network" in
+its reports, which allows local traffic to be grouped separately
+from normal Internet traffic.  Lastly, you may notice that the
+last line of the file contains an entry with the same IP address
+where a name should be.  This entry will prevent the Webalizer
+from ever trying to lookup 127.0.0.1,  which is the 'localhost'
+address, when it is found in a log.  The second number after the IP
+address (1) tells the Webalizer that it is an unresolved entry, not
+a resolved hostname (ie: has no name).  Entries such as this one can
+be used to reduce DNS lookups on addresses that are known not to
+resolve.
+
+
+Considerations
+--------------
+
+Processing of live log files is discouraged, as the chances of log records
+being written between the time of DNS resolution and normal processing will
+cause problems.
+
+If you are using STDIN for the input stream (log file) and have run-time
+DNS cache file creation/update enabled.. the program will exit after the
+cache file has been created/updated and no output will be produced.  If
+you must use STDIN for the input log, you will need to process the stream
+twice, once to create/update the cache file, and again to produce the
+reports.  The reason for this is that stream inputs from STDIN cannot
+be 'rewound' to the beginning like files can, so must be given twice.
+
+Cached DNS addresses have a default TTL (time to live) of 7 days.  This
+may now be changed using the CacheTTL config file keyword to any value
+from 1 to 100 (days).  You may also now specify if unresolved addresses
+should be stored in the DNS cache.  Normally, unresolved IP addresses
+are NOT saved in the cache and are looked up each time the program is
+run.
+
+There is an absolute maximum of 100 child processes that may be created,
+however the actual number of children should be significantly less than
+the maximum.. typical usage should be between 5 and 20.
+
+Special thanks to Henning P. Schmiedehausen <hps@tanstaafl.de> for the
+original dns-resolver code he submitted,  which was the basis for this
+implementation.  Also thanks to Jose Carlos Medeiros for the inital IPv6
+support code.
+