Lynx (Windows) et encodage

HP · Message par HP » 11 juin 2006, 09:21

Bon, j'ai un petit souci avec Lynx :

comme on peut le voir sur cette capture, que ce soit en encodage ISO-8859-1 ou Windows-1252 (je sais, encodage Windows, mais bon, j'ai pas encore de filtres qui remplacent automatiquement tout les caractères de ce charset en autre chose, genre les fameux apostrophes de MS Word par exemple)

Donc, les caratères accentuées s'affichent de comme sur le screenshot, et ce même si il sont codés en entités HTML, donc je pense que ça vient peut être de mon Lynx que j'ai récupéré sur cette page : http://www.pervalidus.net/cygwin/lynx/
si c'est la faute de mon installation Lynx, où pourrais je trouver un Lynx fonctionnant correctement sous Windows (clef USB)

d'avance merci pour vos avis éclairés

teoli2003 · Message par **teoli2003** » 11 juin 2006, 09:34

A mon avis c'est un problème de configuration de l'encodage du terminal et pas de lynx. Il faut peut-être chercher dans cette direction.

Message envoyé avec : Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.8.1a3) Gecko/20060609 BonEcho/2.0a3

HP · Message par HP » 11 juin 2006, 09:42

Pas bête ...

est ce qu'un installeur en profite pour régler l'encodage du terminal ?

parce que j'ai installé ce Lynx : http://csant.info/lynx.html
il fonctionne sur le PC ...
bon je vais voir si je peux le rendre portatif.

Merci pour le début de piste

HP · Message par HP » 11 juin 2006, 09:58

bon ça vient du lynx.cfg me semble bien ...
mais c'est tellement "bordélique" ce truc

calimo · Message par **calimo** » 11 juin 2006, 10:06

Si je ne me trompe pas, le terminal windows est en CP-850 ou 852 je sais jamais, un vieil encodage IBM. Mais normalement Lynx devrait savoir en tenir compte.

Cela dit, c'est bizarre, parce que voici ce que j'ai :

Les caractères accentués ont disparu. Note que ça me fait la même chose sur Geckozone

(mais mon terminal est en utf-8, donc c'est logique si Lynx envoie un encodage 8-bits).

Par contre pas de problèmes avec Elinks ou W3M (mais W3M est buggué !

).

Message envoyé avec : Mozilla/5.0 (X11; U; Linux i686; fr; rv:1.8.0.4) Gecko/20060508 Firesalamandre Firefox/1.5.0.4

HP · Message par HP » 11 juin 2006, 10:11

calimo a écrit :Si je ne me trompe pas, le terminal windows est en CP-850 ou 852 je sais jamais, un vieil encodage IBM. Mais normalement Lynx devrait savoir en tenir compte.

justement dans le lynx.cfg ; je viens de trouver çà :

Code : Tout sélectionner

# The value should be the MIME name of a character set recognized by
# Lynx (case insensitive).
# Find RFC 1345 at http://www.ics.uci.edu/pub/ietf/uri/rfc1345.txt .
#
CHARACTER_SET:cp850

ça correspond à ce que tu avançais ...

HP · Message par HP » 11 juin 2006, 10:20

Yes !
j'ai trouvé ...

donc la version zippée ne vient qu'avec un lynx.cfg de 1 ko :

Code : Tout sélectionner

INCLUDE:D:\Program Files\Lynx\lynx.cfg.dist

STARTFILE:file://localhost/~/

HELPFILE:file://localhost/D:/Program Files/Lynx/lynx_help/lynx_help_main.html.gz

SAVE_SPACE:~/
DEFAULT_CACHE_SIZE:100
DEFAULT_VIRTUAL_MEMORY_SIZE:5120000
SOURCE_CACHE:MEMORY
PERSISTENT_COOKIES:TRUE
NO_DOT_FILES:FALSE
PRETTYSRC:TRUE
JUSTIFY:FALSE

le minimum de base !

mais ça ne suffit pas à accepter les encodages ...
et donc faut aller voir dans une version "complète"

remixer le tout, parce que si on prend un lynx.cfg complet, on risque (euphémisme) de planter à chaque démarrage (puisque Lynx va essayer d'accéder à des choses qu'il ne trouvera pas)
donc le cfg minimum mais qui permet de gérer les encodages :

Code : Tout sélectionner

INCLUDE:D:\Program Files\Lynx\lynx.cfg.dist

STARTFILE:file://localhost/~/

HELPFILE:file://localhost/D:/Program Files/Lynx/lynx_help/lynx_help_main.html.gz

SAVE_SPACE:~/
DEFAULT_CACHE_SIZE:100
DEFAULT_VIRTUAL_MEMORY_SIZE:5120000
SOURCE_CACHE:MEMORY
PERSISTENT_COOKIES:TRUE
NO_DOT_FILES:FALSE
PRETTYSRC:TRUE
JUSTIFY:FALSE

.h1 Character sets

.h2 CHARACTER_SET
# CHARACTER_SET defines the display character set, i.e., assumed to be
# installed on the user's terminal.  It determines which characters or strings
# will be used to represent 8-bit character entities within HTML.  New
# character sets may be defined as explained in the README files of the
# src/chrtrans directory in the Lynx source code distribution.  For Asian (CJK)
# character sets, it also determines how Kanji code will be handled.  The
# default is defined in userdefs.h and can be changed here or via the
# 'o'ptions menu.  The 'o'ptions menu setting will be stored in the user's RC
# file whenever those settings are saved, and thereafter will be used as the
# default.  For Lynx a "character set" has two names:  a MIME name (for
# recognizing properly labeled charset parameters in HTTP headers etc.), and a
# human-readable string for the 'O'ptions Menu (so you may find info about
# language or group of languages besides MIME name).  Not all 'human-readable'
# names correspond to exactly one valid MIME charset (example is "Chinese");
# in that case an appropriate valid (and more specific) MIME name should be
# used where required.  Well-known synonyms are also processed in the code.
#
# Raw (CJK) mode
#
# Lynx normally translates characters from a document's charset to display
# charset, using ASSUME_CHARSET value (see below) if the document's charset
# is not specified explicitly.  Raw (CJK) mode is OFF for this case.
# When the document charset is specified explicitly, that charset
# overrides any assumption like ASSUME_CHARSET or raw (CJK) mode.
#
# For the Asian (CJK) display character sets, the corresponding charset is
# assumed in documents, i.e., raw (CJK) mode is ON by default.  In raw CJK
# mode, 8-bit characters are not reverse translated in relation to the entity
# conversion arrays, i.e., they are assumed to be appropriate for the display
# character set.  The mode should be toggled OFF when an Asian (CJK) display
# character set is selected but the document is not CJK and its charset not
# specified explicitly.
#
# Raw (CJK) mode may be toggled by user via '@' (LYK_RAW_TOGGLE) key,
# the -raw command line switch or from the 'o'ptions menu.
#
# Raw (CJK) mode effectively changes the charset assumption about unlabeled
# documents.  You can toggle raw mode ON if you believe the document has a
# charset which does correspond to your Display Character Set.  On the other
# hand, if you set ASSUME_CHARSET the same as Display Character Set you get raw
# mode ON by default (but you get assume_charset=iso-8859-1 if you try raw mode
# OFF after it).
#
# Note that "raw" does not mean that every byte will be passed to the screen.
# HTML character entities may get expanded and translated, inappropriate
# control characters filtered out, etc.  There is a "Transparent" pseudo
# character set for more "rawness".
#
# Since Lynx now supports a wide range of platforms it may be useful to note
# the cpXXX codepages used by IBM PC compatible computers, and windows-xxxx
# used by native MS-Windows apps.  We also note that cpXXX pages rarely are
# found on Internet, but are mostly for local needs on DOS.
#
# Recognized character sets include:
#
.nf
#    string for 'O'ptions Menu          MIME name
#    ===========================        =========
#    7 bit approximations (US-ASCII)    us-ascii
#    Western (ISO-8859-1)               iso-8859-1
#    Western (ISO-8859-15)              iso-8859-15
#    Western (cp850)                    cp850
#    Western (windows-1252)             windows-1252
#    IBM PC US codepage (cp437)         cp437
#    DEC Multinational                  dec-mcs
#    Macintosh (8 bit)                  macintosh
#    NeXT character set                 next
#    HP Roman8                          hp-roman8
#    Chinese                            euc-cn
#    Japanese (EUC-JP)                  euc-jp
#    Japanese (Shift_JIS)               shift_jis
#    Korean                             euc-kr
#    Taipei (Big5)                      big5
#    Vietnamese (VISCII)                viscii
#    Eastern European (ISO-8859-2)      iso-8859-2
#    Eastern European (cp852)           cp852
#    Eastern European (windows-1250)    windows-1250
#    Latin 3 (ISO-8859-3)               iso-8859-3
#    Latin 4 (ISO-8859-4)               iso-8859-4
#    Baltic Rim (cp775)                 cp775
#    Baltic Rim (windows-1257)          windows-1257
#    Cyrillic (ISO-8859-5)              iso-8859-5
#    Cyrillic (cp866)                   cp866
#    Cyrillic (windows-1251)            windows-1251
#    Cyrillic (KOI8-R)                  koi8-r
#    Arabic (ISO-8859-6)                iso-8859-6
#    Arabic (cp864)                     cp864
#    Arabic (windows-1256)              windows-1256
#    Greek (ISO-8859-7)                 iso-8859-7
#    Greek (cp737)                      cp737
#    Greek2 (cp869)                     cp869
#    Greek (windows-1253)               windows-1253
#    Hebrew (ISO-8859-8)                iso-8859-8
#    Hebrew (cp862)                     cp862
#    Hebrew (windows-1255)              windows-1255
#    Turkish (ISO-8859-9)               iso-8859-9
#    ISO-8859-10                        iso-8859-10
#    Ukrainian Cyrillic (cp866u)        cp866u
#    Ukrainian Cyrillic (KOI8-U)        koi8-u
#    UNICODE (UTF-8)                    utf-8
#    RFC 1345 w/o Intro                 mnemonic+ascii+0
#    RFC 1345 Mnemonic                  mnemonic
#    Transparent                        x-transparent
.fi
#
# The value should be the MIME name of a character set recognized by
# Lynx (case insensitive).
# Find RFC 1345 at http://www.ics.uci.edu/pub/ietf/uri/rfc1345.txt .
#
CHARACTER_SET:cp850


.h2 ASSUME_CHARSET
# ASSUME_CHARSET changes the handling of documents which do not
# explicitly specify a charset.  Normally Lynx assumes that 8-bit
# characters in those documents are encoded according to iso-8859-1
# (the official default for the HTTP protocol).  When ASSUME_CHARSET
# is defined here or by an -assume_charset command line flag is in effect,
# Lynx will treat documents as if they were encoded accordingly.
# See above on how this interacts with "raw mode" and the Display
# Character Set.
# ASSUME_CHARSET can also be changed via the 'o'ptions menu but will
# not be saved as permanent value in user's .lynxrc file to avoid more chaos.
#
ASSUME_CHARSET:utf-8


.h2 ASSUMED_DOC_CHARSET_CHOICE
.h2 DISPLAY_CHARSET_CHOICE
# It is possible to reduce the number of charset choices in the 'O'ptions menu
# for "display charset" and "assumed document charset" fields via
# DISPLAY_CHARSET_CHOICE and ASSUMED_DOC_CHARSET_CHOICE settings correspondingly.
# Each of these settings can be used several times to define the set of possible
# choices for corresponding field. The syntax for the values is
#
#	string | prefix* | *
#
# where
#
#	'string' is either the MIME name of charset or it's full name (listed
#		either in the left or in the right column of table of
#		recognized charsets), case-insensitive - e.g.  'Koi8-R' or
#		'Cyrillic (KOI8-R)' (both without quotes),
#
#	'prefix' is any string, and such value will select all charsets having
#		the name with prefix matching given (case insensitive), i.e.,
#		for the charsets listed in the table of recognized charsets,
#
.ex
# ASSUMED_DOC_CHARSET_CHOICE:cyrillic*
#		will be equal to specifying
.ex 4
# ASSUMED_DOC_CHARSET_CHOICE:cp866
# ASSUMED_DOC_CHARSET_CHOICE:windows-1251
# ASSUMED_DOC_CHARSET_CHOICE:koi8-r
# ASSUMED_DOC_CHARSET_CHOICE:iso-8859-5
#		or lines with full names of charsets.
#
#	literal string '*' (without quotes) will enable all charset choices
#		in corresponding field.  This is useful for overriding site
#		defaults in private pieces of lynx.cfg included via INCLUDE
#		directive.
#
# Default values for both settings are '*', but any occurrence of settings
# with values that denote any charsets will make only listed choices available
# for corresponding field.
#ASSUMED_DOC_CHARSET_CHOICE:*
#DISPLAY_CHARSET_CHOICE:*


.h2 ASSUME_LOCAL_CHARSET
# ASSUME_LOCAL_CHARSET is like ASSUME_CHARSET but only applies to local
# files.  If no setting is given here or by an -assume_local_charset
# command line option, the value for ASSUME_CHARSET or -assume_charset
# is used.  It works for both text/plain and text/html files.
# This option will ignore "raw mode" toggling when local files are viewed
# (it is "stronger" than "assume_charset" or the effective change
# of the charset assumption caused by changing "raw mode"),
# so only use when necessary.
#
#ASSUME_LOCAL_CHARSET:iso-8859-1


.h2 PREPEND_CHARSET_TO_SOURCE
# PREPEND_CHARSET_TO_SOURCE:TRUE tells Lynx to prepend a META CHARSET line
# to text/html source files when they are retrieved for 'd'ownloading
# or passed to 'p'rint functions, so HTTP headers will not be lost.
# This is necessary for resolving charset for local html files,
# while the assume_local_charset is just an assumption.
# For the 'd'ownload option, a META CHARSET will be added only if the HTTP
# charset is present.  The compilation default is TRUE.
# It is generally desirable to have charset information for every local
# html file, but META CHARSET string potentially could cause
# compatibility problems with other browsers, see also PREPEND_BASE_TO_SOURCE.
# Note that the prepending is not done for -source dumps.
#
#PREPEND_CHARSET_TO_SOURCE:TRUE


.h2 NCR_IN_BOOKMARKS
# NCR_IN_BOOKMARKS:TRUE allows you to save 8-bit characters in bookmark titles
# in the unicode format (NCR).  This may be useful if you need to switch
# display charsets frequently.  This is the case when you use Lynx on different
# platforms, e.g., on UNIX and from a remote PC, and want to keep the bookmarks
# file persistent.
# Another aspect is compatibility:  NCR is part of I18N and HTML4.0
# specifications supported starting with Lynx 2.7.2, Netscape 4.0 and MSIE 4.0.
# Older browser versions will fail so keep NCR_IN_BOOKMARKS:FALSE if you
# plan to use them.
#
#NCR_IN_BOOKMARKS:FALSE


.h2 FORCE_8BIT_TOUPPER
# FORCE_8BIT_TOUPPER overrides locale settings and uses internal 8-bit
# case-conversion mechanism for case-insensitive searches in non-ASCII display
# character sets.  It is FALSE by default and should not be changed unless
# you encounter problems with case-insensitive searches.
#
#FORCE_8BIT_TOUPPER:FALSE


.h2 OUTGOING_MAIL_CHARSET
# While Lynx supports different platforms and display character sets
# we need to limit the charset in outgoing mail to reduce
# trouble for remote recipients who may not recognize our charset.
# You may try US-ASCII as the safest value (7 bit), any other MIME name,
# or leave this field blank (default) to use the display character set.
# Charset translations currently are implemented for mail "subjects= " only.
#
#OUTGOING_MAIL_CHARSET:


.h2 ASSUME_UNREC_CHARSET
# If Lynx encounters a charset parameter it doesn't recognize, it will
# replace the value given by ASSUME_UNREC_CHARSET (or a corresponding
# -assume_unrec_charset command line option) for it.  This can be used
# to deal with charsets unknown to Lynx, if they are "sufficiently
# similar" to one that Lynx does know about, by forcing the same
# treatment.  There is no default, and you probably should leave this
# undefined unless necessary.
#
#ASSUME_UNREC_CHARSET:iso-8859-1

.h2 PREFERRED_LANGUAGE
# PREFERRED_LANGUAGE is the language in MIME notation (e.g., "en",
# "fr") which will be indicated by Lynx in its Accept-Language headers
# as the preferred language.  If available, the document will be
# transmitted in that language.  Users can override this setting via
# the 'o'ptions menu and save that preference in their RC file.
# This may be a comma-separated list of languages in decreasing preference.
#
#PREFERRED_LANGUAGE:en


.h2 PREFERRED_CHARSET
# PREFERRED_CHARSET specifies the character set in MIME notation (e.g.,
# "ISO-8859-2", "ISO-8859-5") which Lynx will indicate you prefer in
# requests to http servers using an Accept-Charsets header.  Users can
# change it via the 'o'ptions menu and save that preference in their RC file.
# The value should NOT include "ISO-8859-1" or "US-ASCII",
# since those values are always assumed by default.
# If a file in that character set is available, the server will send it.
# If no Accept-Charset header is present, the default is that any
# character set is acceptable.  If an Accept-Charset header is present,
# and if the server cannot send a response which is acceptable
# according to the Accept-Charset header, then the server SHOULD send
# an error response with the 406 (not acceptable) status code, though
# the sending of an unacceptable response is also allowed.  See RFC 2068
# (http://www.ics.uci.edu/pub/ietf/uri/rfc2068.txt).
#
#PREFERRED_CHARSET:

J'ai laissé les commentaires issus de la version originale copiée, si celà pose problème dans le cas de l'espace occupé (poids) ne pas hésiter à sacrifier les octets surnuméraires

calimo · Message par **calimo** » 11 juin 2006, 11:15

La solution pour Ubuntu : http://forum.ubuntu-fr.org/viewtopic.ph ... 88#p337388
C'est qu'il fallait le trouver ce fichier de configuration

Message envoyé avec : Mozilla/5.0 (X11; U; Linux i686; fr; rv:1.8.0.4) Gecko/20060508 Firesalamandre Firefox/1.5.0.4

HP · Message par HP » 11 juin 2006, 12:55

calimo a écrit :Si je ne me trompe pas, le terminal windows est en CP-850 ou 852 je sais jamais, un vieil encodage IBM. Mais normalement Lynx devrait savoir en tenir compte.

Par contre pas de problèmes avec Elinks ou W3M (mais W3M est buggué ! ).
http://img118.imageshack.us/img118/3900 ... tu10na.png

en fait les tests portaient plus sur la page d'accueil que sur la page index
parce que la page d'accueil a enfin :
un header "full div/css"
et globalement une meilleure séparation du style et du contenu ...

mais surtout le header que j'ai essayé de rendre aussi sémantique que possible (titre + "sous-titre description" + menu liens d'accessibilité ancres)
y a encore sûrement des améliorations à apporter mais je pense que c'est mieux que l'ancien header :

Code : Tout sélectionner

<div align="center">
<table style="background-color:#345487; width:98%; max-width:98%" cellpadding="0" cellspacing="0"  class="forumline">
            <tr>
             <td style="background-image:url(/head.png); width:300px; height:95px">
	     <a title="Articles - ..." href="/articles/index.php">
	     <img src="/image_left.png" style="width:300px; height:95px; border:0" alt="Articles - ..." title="" /></a></td>
	     <td style="background-image:url(/head.png); width:100%">&nbsp;</td>
	     <td style="background-image:url(/head.png); width:100px">
	     <a title="Index - Forum ..." href="/forum/index.php">
	     <img src="/image_right.png" style="width:100px; height:95px; border:0" alt="Index - Forum ..." title=""></a></td>
            </tr>
        </table>
</div>

à la place on a :

Code : Tout sélectionner

<div class="global">
 <div id="header">
   <h1><a title="Index - Forum ..." id="top_right" href="/forum/index.php"><span>Forum ...</span></a></h1>
   <h3><a title="Articles - ..." id="top_left" href="/articles/index.php"><span>Aqua###philie technique, articles aqua###philes et forum aqua###phile</span></a></h3>
 </div>

<p id="prelude">
  <a href="#discussions">Aller au contenu</a> <b>::</b>
  <a href="#menu">Aller au menu</a> <b>::</b>
  <a href="#recherche">Aller &agrave; la recherche</a>
</p>

</div>

un h3 parce que ça sert de description, je voulais que ce soit accentué en hx, mais je sens pas trop un h2 ici.
plus de balises img, au lieu de çà des zones cliquables, je préfère çà pour les "images décoratives" comme les menus (par exemple) à terme je pense utiliser la même chose pour les images/liens vers les RSS.
un menu avec des ancres vers divers contenus de la page (<- ça n'existait pas sur la version table çà par contre)

et un div header dans un div global, pour le centrage sous IE bien sûr

et sinon, pour de la doc à propos de Lynx :
http://dominique.guebey.club.fr/tekno/lynx/lynx_prt.htm

Lynx (Windows) et encodage

Lynx (Windows) et encodage

Qui est en ligne ?