21-Oct-88 05:55:25-PDT,41684;000000000000
Return-Path:
From: Peter King
Date: Fri, 21 Oct 88 13:18:50 BST
Subject: refer -> BiBTeX conversion
Following feedback from a number of users, in particular Johnathan
Bowen, I have modified and updated my 'ref2bib' script for conversion
of refer databases to BiBTeX format.
The files can now be extracted with a name of the user's choosing, to
avoid clashes with Johnathan Bowen's 'ref2bib', although that remains
the default.
The heuristics for assigning the type of reference have been augmented,
and a number of options are now selectable at run time, including the
length and number of authors to be used in generating keys.
Since the script is so long, it will probably be stored in the
appropriate archives, but I am happy to mail it to people who do not
have easy access to them.
Peter King, Computer Science Department JANET: pjbk@uk.ac.hw.cs
Heriot-Watt University ARPA: pjbk@cs.hw.ac.uk
79 Grassmarket, Edinburgh EH1 2HJ or pjbk%cs.hw.ac.uk@ucl-cs
Phone: (+44) 31 225 6465 Ext. 555 UUCP: ..!ukc!cs.hw.ac.uk!pjbk
%%%
%%% This version also includes a bug fix that was submitted by Peter
%%% on 10/25/88. Malcolm Brown, TeXhax moderator
%%%
-----cut here-----------
export PATH || exec /bin/sh $0 $*
: "This is a shar archive; use /bin/sh to extract"
: "Extracted files will be owned by you, and will have"
: "default permissions"
: "to have a name other than the default 'ref2bib' give"
: "the new name as a second argument to sh, after this shar file name"
PRGRM=ref2bib
if [ -n "$1" ]
then
PRGRM=$1
fi
PATH=/bin:/usr/bin
echo If this archive is complete, \"End of archive\" will appear at the end
echo Extracting $PRGRM
sed 's/^X//' <<'End-of-file' >$PRGRM
X#!/bin/sh
X#
X# shell script to convert refer (or bib) databases to BiBTeX format
X#
X# Most of the shell script is based on that by Jonathan Bowen
X# The awk and sed scripts are the work of Peter King
X# with some ideas stolen from Jonathan Bowen
X#
X#
XSEDFILE=${TMP-/tmp}/ref2b$$.sed
XAWKFILE=${TMP-/tmp}/ref2b$$.awk
XKEYFILE=${TMP-/tmp}/ref2b$$.key
XNKEYFILE=${TMP-/tmp}/ref2b$$.nkey
XAKEYFILE=
XCAPSFILE=
XPROGNAME=`basename $0`
XNAUTHOR=3
XLAUTHOR=3
XERRFILE=${PROGNAME}.errs
XDEFAULTWIDTH=72
XWIDTH=$DEFAULTWIDTH
XNAMEDFILES=false
XBIB=bib
XDEBUG=false
X
XGEN=`date`" on "`hostname`
XNAME=$BIB
XSTDIN=''
XNEWFILE=""
X
Xwhile expr X$1 : X'-' > /dev/null
Xdo
X case "$1" in
X -|-0|-w)
X WIDTH=-1
X ;;
X -[1-9]|-[1-9][0-9]|-[1-9][0-9][0-9])
X WIDTH=`expr X"$1" : X'-\(.*\)'`
X ;;
X -c) : capitals file
X CAPSFILE="$2"
X shift
X ;;
X -d) : Enable debugging
X DEBUG=true
X ;;
X -e) : error file
X ERRFILE="$2"
X shift
X ;;
X -kf) : key used file
X AKEYFILE=$2
X shift
X ;;
X -k) : no of authors in key
X NAUTHOR=$2
X shift
X ;;
X -l) : no of authors name letters to use
X LAUTHOR=$2
X shift
X ;;
X -n) : Named output files
X NAMEDFILES=true
X ;;
X -u|-U)
X echo "Usage: $PROGNAME [ options ] [ file ... ]
XConverts Unix \"refer\" format to \"BibTeX\" database format.
X-c file Use file as source of protected words in titles
X-d enable debugging (default=$DEBUG)
X-e file error output in "file" (default=$ERRFILE)
X-kf file Use file to initialise keys used
X-k N Use N author's names for key (default=$NAUTHOR)
X-l N Use N characters of author's name for key (default=$LAUTHOR)
X-n output to named files (ext \".$BIB\") (default=$NAMEDFILES)
X-w no maximum width
X-u display usage information
X-N maximum width of N characters (1-999) (default=$DEFAULTWIDTH)"
X exit 0
X ;;
X -*)
X echo "Usage: $PROGNAME [ -[width] ] [ file ... ]"
X exit 0
X ;;
X esac
X shift
Xdone
X
Xtrap "rm -f $SEDFILE $AWKFILE $KEYFILE $NKEYFILE ;exit" 0 1 2 3 15
X
Xcat $ERRFILE
Xcat $KEYFILE
X[ $AKEYFILE ] && [ -r $AKEYFILE ] && cat $AKEYFILE > $KEYFILE
X
X# although # introduces a comment to sed, this is undocumented,
X# and we might as well strip it first if the file is to be used many times
X# but this should work with cat replacing the sed -e here
X
Xsed -e '/^#/d' << 'ZZ' >$SEDFILE
X#
X# sed script to do some of the ref to bib database conversion
X#
X# written by Peter King, Heriot-Watt University
X# You may do anything you like with this code
X# EXCEPT claim that you wrote it
X#
X# remove as many redundant characters as possible
Xs/ / /g
Xs/ */ /g
X# convert dashes
Xs/ - / --- /g
Xs/ $//
X#
X# First alter the TeX special characters
X# but not the first %
Xs/\(.\)%/\1{\\%}/g
X#
Xs/[&#_{}]/{\\&}/g
X# you may want to leave dollars (especially if you have eqn in your refs
Xs/\$/\\$/g
X# convert the special characters and accents from troff to BibTeX
X# assumes the accents are those of the Berkeley -ms with .AM
X#
X# convert any font rubbish (assumes that \fR is the default )
X/\\\\*f./{
X s/\\\\*f[I2]/{\\em /g
X s/\\\\*f[B3]/{\\bf /g
X s/\\\\*f[1PR]/}/g
X}
X/\\\\*[-*0('`^:~_/o,v".8?!QU ]/{
X s/\(.\)\\\\*\*'/{\\'\1}/g
X s/\(.\)\\\\*\*`/{\\`\1}/g
X s/\(.\)\\\\*\*^/{\\^\1}/g
X s/\(.\)\\\\*\*:/{\\"\1}/g
X s/\(.\)\\\\*\*~/{\\~\1}/g
X s/\(.\)\\\\*\*_/{\\=\1}/g
X s/\([oO]\)\\\\*\*\//{\\\1}/g
X s/\([aA]\)\\\\*\*o/{\\\1\1}/g
X s/\(.\)\\\\*\*,/{\\c{\1}}/g
X s/\(.\)\\\\*\*v/{\\v{\1}}/g
X s/\(.\)\\\\*\*"/{\\H{\1}}/g
X s/\(.\)\\\\*\*\./{\\d{\1}}/g
X s/\\\\*\*8/{\\ss}/g
X s/\\\\*\*?/{?`}/g
X s/\\\\*\*!/{!`}/g
X s/\\\\*\*(P\([lL]\)/{\\\1}/g
X s/\\\\*\*(ae\//{\\ae}/g
X s/\\\\*\*(Ae\//{\\AE}/g
X s/\\\\*\*(oe\//{\\oe}/g
X s/\\\\*\*(Oe\//{\\OE}/g
X# quotes
X s/\\\\*\*Q/``/g
X s/\\\\*\*U/''/g
X s/\\\\*\*-/---/g
X# convert \[space] to \0 for convenience
X s/\\\\* /\\0/g
X# \0 as space between surname de\0Souza etc.
X s/\\\\*0\([a-z]*\)\\\\*0/\\0\1 /g
X s/ \([a-z]*\)\\\\*0/ \1 /g
X# but trap the ones that start with a capital letter and convert them to
X# ties
X s/\\\\*0/~/g
X#
X# now deal with special characters and Greek
X s/\\\\*(hy/-/g
X s/\\\\*(em/---/g
X s/\\\\*(co/{\\copyright}/g
X s/\\\\*(sc/{\\S}/g
X s/\\\\*(if/$\\infty$/g
X s/\\\\*(\*a/$\\alpha$/g
X s/\\\\*(\*b/$\\beta$/g
X s/\\\\*(\*g/$\\gamma$/g
X s/\\\\*(\*d/$\\delta$/g
X s/\\\\*(\*e/$\\epsilon$/g
X s/\\\\*(\*z/$\\zeta$/g
X s/\\\\*(\*y/$\\eta$/g
X s/\\\\*(\*h/$\\theta$/g
X s/\\\\*(\*i/$\\iota$/g
X s/\\\\*(\*k/$\\kappa$/g
X s/\\\\*(\*l/$\\lambda$/g
X s/\\\\*(\*m/$\\mu$/g
X s/\\\\*(\*n/$\\nu$/g
X s/\\\\*(\*c/$\\xi$/g
X s/\\\\*(\*o/$o$/g
X s/\\\\*(\*p/$\\pi$/g
X s/\\\\*(\*r/$\\rho$/g
X s/\\\\*(\*s/$\\sigma$/g
X s/\\\\*(\*t/$\\tau$/g
X s/\\\\*(\*u/$\\upsilon$/g
X s/\\\\*(\*f/$\\phi$/g
X s/\\\\*(\*x/$\\chi$/g
X s/\\\\*(\*q/$\\psi$/g
X s/\\\\*(\*w/$\\omega$/g
X s/\\\\*(\*A/A/g
X s/\\\\*(\*B/B/g
X s/\\\\*(\*G/$\\Gamma$/g
X s/\\\\*(\*D/$\\Delta$/g
X s/\\\\*(\*E/E/g
X s/\\\\*(\*Z/Z/g
X s/\\\\*(\*Y/H/g
X s/\\\\*(\*H/$\\Theta$/g
X s/\\\\*(\*I/I/g
X s/\\\\*(\*K/K/g
X s/\\\\*(\*L/$\\Lambda$/g
X s/\\\\*(\*M/M/g
X s/\\\\*(\*N/N/g
X s/\\\\*(\*C/$\\Xi$/g
X s/\\\\*(\*O/$O$/g
X s/\\\\*(\*P/$\\Pi$/g
X s/\\\\*(\*R/P/g
X s/\\\\*(\*S/$\\Sigma$/g
X s/\\\\*(\*T/T/g
X s/\\\\*(\*U/$\\Upsilon$/g
X s/\\\\*(\*F/$\\Phi$/g
X s/\\\\*(\*X/X/g
X s/\\\\*(\*Q/$\\Psi$/g
X s/\\\\*(\*W/$\\Omega$/g
X}
X# Now trap title words that must be capitalised
X/^%[^T]/b
X#
X# first all words that are all capitals (at least two consecutive)
X# we need the slashes to allow for M/M/1 queues
Xs;[A-Z][A-Z/][A-Z/0-9]*;{&};g
X# and single letters (except A!)
Xs/\([{ ]\)\([B-Z]\)\([ :\.,}]\)/\1{\2}\3/g
Xs/ \([B-Z]\)$/ {\1}/
XZZ
X
X# Now append to this any proper name that might appear in titles
X# Use sed to generate a sed script !
X# the input is a list of words which must remain capitalised
X# one to a line
X# a trailing * will be converted into a pattern at the end to get Markov,
X# Markovian, etc.
X
Xcat - $CAPSFILE << ZZ | sed -e '/^#/d;s/\*$/[^ -,;]*/;s=^..*$=s/&/{\&}/g=' >> $SEDFILE
X# File of Title information that must maintain its capitalised state
X#
X# some proper names
X#
X# first some mathematicians
XAbel
XBernoulli
XBessel
XBeta
XBorel
XCauchy
XChurch
XRosser
XDedekind
XDescartes
XDirichlet
XEuclid*
XEuler
XFibonacci
XFermat
XFourier
XFresnel
XFrobenius
XPerron
XGamma
XGauss*
XHadamard
XHilbert
XHorner
XHolder
XJacobi*
XJensen
XMarkov*
XArnoldi
XLaplace
XLaguerre
XLagrang*
XLegendre
XLeibnitz
XLipschitz
X# this is really Poincare (acute accent), but the accent processing will disrupt it
XPoincar
XHermite
XRayleigh
XRitz
XRiemann
X# this is really Rouche (acute accent), but the accent processing will disrupt it
XRouch
XStieltjes
XStiener
XSchwarz
XWeibull
XWald
XKronecker
XKarmarkar
XKendall
XDiophantine
XDelbrouck
XBayes*
XSchafer
XDempster
XRunge
XKutta
XPollaczek
XKhinchin
XPalm
XErlang
XEngset
XLittle's
XKosten
XGittins
XFeller
XCox*
XPoisson
XChapman
XKolmogorov
XSmirnov
XWeiner
XHopf
XStirling
X
X# computing
XBuzen
XGordon
XNewell
XLemoine
XPierce
XJackson
XNewhall
XTuring
XNorton
XPetri
XWilkinson
XSkinner
XHarrison
XCambridge
XEthernet
XAloha
X
X# coding theory
XHamming
XHuffman
XReed
XShannon
XSolomon
XViterbi
XZZ
X
X# strip out the comments from the AWKFILE, just to cut down on the character count
X# that awk needs to read before starting. For a single file conversion
X# this will probably make no difference
X# but this should work with cat replacing the sed -e here
X
Xsed -e '/^[ ]*#/d' -e '/#/s/[ ]*#.*$//' << 'ZZ' > $AWKFILE
X#
X# awk script to convert refer (or bib) format databases
X# to BiBTeX format.
X#
X# written by Peter King, Heriot-Watt University
X# use freely, but don't claim that you wrote it
X#
X# Generates keys using authors names and year
X#
X# You may wish to alter treatment of key fields that are ignored
X# such as %U %W %Y etc.
X#
X# NB because awk recognises end of line as a statement terminator
X# you cannot reliably split the long lines in the script
X#
X# regular expressions should be sorted according to frequency
X# so that minimal tests are made
X# From tests in a local data base the order given appears quite good
X# 2883 %A
X# 1813 blank lines
X# 1774 %T
X# 1764 %D
X# 1505 %P
X# 1347 %J
X# 1331 %V
X# 1201 %N
X# 773 .. continuation lines
X# 501 %C
X# 424 %I
X# 192 %B
X# 187 %E
X# 92 %S
X# 89 %R
X# 33 %X
X# 30 %K
X# 16 %O
X#
XBEGIN {
X # suffices for the key generation process
X for(i=1;i<=26;i++)
X addkey[i] = substr("abcdefghijklmnopqrstuvwxyz",i,1)
X # standard BiBTeX types so that you can change the case to BOOK
X # or book, as required. Not all are used
X article = "Article" # the @ will be added later
X book = "Book"
X booklet = "Booklet"
X inbook = "InBook"
X incollection = "InCollection"
X inproceedings = "InProceedings"
X manual = "Manual"
X mastersthesis = "MastersThesis"
X misc = "Misc"
X phdthesis = "PhDThesis"
X proceedings = "Proceedings"
X techreport = "TechReport"
X unpublished = "Unpublished"
X
X bibtype = misc
XZZ
X
X# the following 'echo' commands communicate shell variables to
X# the awk script.
X
Xecho >> $AWKFILE lkey = $LAUTHOR # number of characters used from authors to make key
Xecho >> $AWKFILE maxauthor = $NAUTHOR # maximum number of authors to use in
X # constructing key
Xecho >> $AWKFILE errfil = \"$ERRFILE\"
Xecho >> $AWKFILE keyfil = \"$KEYFILE\"
Xecho >> $AWKFILE nkeyfil = \"$NKEYFILE\"
Xecho >> $AWKFILE lwidth = $WIDTH
Xecho >> $AWKFILE print \"@Comment{ Database converted by $PROGNAME\\n\\t$GEN}\\n\"
X
Xsed -e '/^[ ]*#/d' -e '/#/s/[ ]*#.*$//' << 'ZZ' >> $AWKFILE
X rx = 1
X percent = 0 # not in a reference
X }
X
X/[^{$]\\/ || /^\\/ { # we've protecetd all the \ we introduced in {}
X err = 1
X print "Non translated \\ symbol : Reference " rx >> errfil
X print $0 >> errfil
X }
X
X/^%/ { # any % entry
X percent = 1 # in a reference
X entry = substr($0,4)
X pctchar = substr($0,2,1)
X }
X
X/^%[AQ]/ { # authors (multiple occurrences possible)
X A ++
X if ( $1 == "%A" ) authors[A] = entry
X else authors[A] = "{"entry"}" # corporate authors need protection!
X if (A> maxauthor) next
X # generate the key string
X ic = 0
X lc = 1
X if ( $1 == "%A" ) keyfield = $NF
X else keyfield = $2 # corporate authors use first word
X while(ic < lkey && lc <= length(keyfield) ){
X kc = substr( keyfield, lc, 1)
X if ( kc ~ /[a-zA-Z]/ ){
X keys = keys kc
X ic++
X if (ic==lkey) next
X }
X else if ( kc == "\\" ) lc ++
X lc ++
X }
X next
X }
X
X/^$/ { # blank line
X if (percent) {
X # end of a reference
X refs ++
X if (T==0) print "No title : Reference "refs" "keys >> errfil
X if (A==0) print "No author : Reference "refs" "keys >> errfil
X if (D==0) print "No date : Reference "refs" "keys >> errfil
X if ((!T)||(!A)||(!D))err=1
X
X
X # date processing
X nf = split(date,z)
X year = date
X kyear = year
X if(nf>1){
X xmonth = z[1]
X mm = 1
X i = 2
X while ( z[i] !~ /^[0-9][0-9][0-9][0-9]/ ) {
X xmonth = xmonth " " z[i]
X i++
X mm = 0
X }
X month = "{ " xmonth " }"
X
X year = z[i]
X kyear = year # in case there is any extraneous info
X i++
X while ( i <= nf ) {
X year = year " " z[i]
X i++
X }
X if((mm) && ( xmonth ~ /^[A-Za-z.]*$/ )){
X #if the month is only letters and .
X if(xmonth ~ /^Ja/) month = "jan"
X if(xmonth ~ /^Fe/) month = "feb"
X if(xmonth ~ /^Mar/) month = "mar"
X if(xmonth ~ /^Ap/) month = "apr"
X if(xmonth ~ /^May/) month = "may"
X if(xmonth ~ /^Jun/) month = "jun"
X if(xmonth ~ /^Jul/) month = "jul"
X if(xmonth ~ /^Aug/) month = "jan"
X if(xmonth ~ /^Se/) month = "sep"
X if(xmonth ~ /^O/) month = "oct"
X if(xmonth ~ /^N/) month = "nov"
X if(xmonth ~ /^D/) month = "dec"
X }
X }
X if ( year !~ /^[0-9][0-9]*$/ ) year = "{ " year " }"
X
X # sort out the editors
X if (E) {
X tew = split(alleditors,z)
X E = 1
X editor[E] = z[1]
X sc = " "
X i = 2
X while( i <= tew ){
X if ( z[i] == "and" ) {
X E++
X editor[E] = ""
X sc = ""
X }
X else {
X lastc = substr(z[i],length(z[i]),1)
X if(lastc == "," && z[i+1] !~ /^Jn*r\./ ) {
X editor[E] = editor[E] sc substr(z[i], 1, length(z[i])-1)
X if ( z[i+1] != "and" ) {
X E++
X editor[E] = ""
X sc = ""
X }
X }
X else {
X editor[E] = editor[E] sc z[i]
X sc = " "
X }
X }
X i++
X }
X }
X
X # classify the reference
X
X if (J) {
X #journal or conference
X
X if (C||I||((!V)&&(!N))) {
X # its a conference if there is a city, publisher
X # or no volume or issue number
X conf++
X bibtype = inproceedings
X jnl = "Booktitle"
X jtype = "Conference proceedings"
X }
X else {
X jour ++
X bibtype = article
X jnl = "Journal"
X jtype = "Journal"
X }
X
X if ( B||E||R||(!P)) {
X err=1
X if (B||E||R)
X print "Journal & book?: Reference "refs" "keys >> errfil
X if (!P) print "No page nos.? : Reference "refs" "keys >> errfil
X }
X if (err){
X print jtype " reference in error" >> errfil
X }
X }
X else
X if (B) {
X # article in book
X bibtype = incollection
X
X if (N||R||(!E)||(!I)||(!C)||(!P)||(V&&(!S))){
X err=1
X if (!E) print "No editor? Reference "refs" " keys >> errfil
X if (!I) print "No publisher? Reference "refs" " keys >> errfil
X if (!C) print "No city? Reference "refs" " keys >> errfil
X if (!P) print "No page nos.? Reference "refs" " keys >> errfil
X if (V&&(!S)) print "Volume but no Series Reference "refs" " keys >> errfil
X if (N) print "Issue no.? Reference "refs" " keys >> errfil
X if (R) print "Report? Reference "refs" " keys >> errfil
X
X }
X if (err) print "Article in book reference in error" >> errfil
X }
X else
X if (R) {
X #report
X bibtype = techreport
X
X if (E||N){
X err=1
X if (N) print "Issue no.? Reference "refs" " keys >> errfil
X if (E) print "Editor? Reference "refs" " keys >> errfil
X }
X if (err) print "Report reference in error" >> errfil
X if ( report ~ /M.*[Tt]hesis/ ) bibtype = mastersthesis
X if ( report ~ /M.*[Dd]issert/ ) bibtype = mastersthesis
X if ( report ~ /D.*[Tt]hesis/ ) bibtype = phdthesis
X if ( report ~ /D.*[Dd]issert/ ) bibtype = phdthesis
X # Working papers and preprints are classified as techreports
X # if ( report ~ /[Ww]orking/ ) bibtype = unpublished
X # if ( report ~ /[Pp]reprint/ ) bibtype = unpublished
X if ( report ~ /[Uu]npublish/ ) bibtype = unpublished
X if ( report ~ /[Mm]anual/ ) bibtype = manual
X
X if (bibtype== unpublished ) {
X if (other == "") other = report
X else other = report " " other
X R = 0
X report = ""
X O = 1
X }
X else
X if (bibtype == manual ) {
X TY = 1
X type = report
X # the publisher is really the organisation
X report = publisher
X R = I
X I = 0
X institute = "Organization"
X }
X else
X if ( bibtype == techreport ) {
X trw = split(report,z)
X if ( trw == 1 ) {
X N = 1
X number = z[1]
X }
X else {
X if ( ( z[1] ~ /^Tech/ ) && ( z[2] ~ /^Rep/ ) )
X {
X z[1] = ""
X z[2] = ""
X TY = 0
X number = z[3]
X for(i=4;i<=trw;i++)
X number = number " " z[i]
X if (trw >=3) N = 1
X }
X else {
X type = z[1]
X i = 2
X while( i <= trw && ( z[i] !~ /^[0-9A-Z-]*$/ ) ) {
X type = type " " z[i]
X i++
X }
X if(i> errfil
X if (N) print "Issue no.? Reference "refs" " keys >> errfil
X if (E) print "Editor? Reference "refs" " keys >> errfil
X if (V&&(!S)) print "Volume but no Series Reference "refs" " keys >> errfil
X }
X if (err) print "Book reference in error" >> errfil
X
X if (S && (( series ~ /[Tt]ech.*[Rr]eport/ ) || ( series ~ /[Tt]ech.*[Mm]ono/ ))){
X bibtype = techreport
X type = series
X TY = 0
X if ( series ~ /[Mm]ono/ ) TY = 1
X institute = "Institution"
X R = I
X I = 0
X report = publisher
X publisher = ""
X }
X }
X else {
X bibtype = misc
X err=1
X if( other ~ /[Uu]npublished/ ) bibtype = unpublished
X print "Unclassified reference" >> errfil
X }
X
X # generate key
X if(keys == "") keys = "ANON"
X keys = keys substr(kyear,3,2)
X if(keyused[keys] >=1) {
X key_suffix = keyused[keys]++
X keys = keys addkey[key_suffix]
X }
X else keyused[keys] = 1
X
X if (err) {
X print "Key: " keys >> errfil
X if(A) for (i=1;i<=A;i++)
X print "%A " authors[i] >> errfil
X if(T) print "%T " title >> errfil
X if(J) print "%J "journal >> errfil
X if(B) print "%B "booktitle >> errfil
X if(V) print "%V "volume >> errfil
X if(N) print "%N "number >> errfil
X if(I) print "%I "publisher >> errfil
X if(C) print "%C "city >> errfil
X if(E) for (i=1;i<=E;i++)
X print "%E "editor[i] >> errfil
X if(S) print "%S "series >> errfil
X if(P) print "%P "pages >> errfil
X if(R) print "%R "report >> errfil
X if(D) print "%D "date >> errfil
X if(O) print "%O "other >> errfil
X print "" >> errfil
X }
X
X if ( other ~ /[Uu]npublished/ ) bibtype = unpublished
X if ( other ~ /[Ee]dition/ ) {
X twc = split(other,z)
X for(i=1;i<=twc;i++)
X if(z[i] ~ /[Ee]dition/) {
X edition = z[i-1]
X z[i-1] = ""
X z[i] = ""
X }
X other = z[1];
X for(i=2;i<=twc;i++) {
X other = other " " z[i]
X }
X if(!(other ~ /[^ ]/ )) {
X # no non space characters
X other = ""
X O = 0
X }
X }
X
X if(O&&H) other = header " " other
X else if(H) {
X O = H
X other = header
X }
X
X if (J) {
X
X # substitute the journal abbreviations from the standard styles
X
X journal = "{ " journal " }"
X # {acmcs} {"ACM Computing Surveys"}
X if ( journal ~ /Comp.* Sur/ ) journal = "acmcs"
X # {acta} {"Acta Informatica"}
X if ( journal ~ /Acta Inf/ ) journal = "acta"
X # {cacm} {"Communications of the ACM"}
X if ( journal ~ /Com.* ACM/ ) journal = "cacm"
X if ( journal ~ /CACM/ ) journal = "cacm"
X # {ibmjrd} {"IBM Journal of Research and Development"}
X if ( journal ~ /IBM J.*R.*D/ ) journal = "ibmjrd"
X # {ibmsj} {"IBM Systems Journal"}
X if ( journal ~ /IBM Sy.*J/ ) journal = "ibmsj"
X # {ieeese} {"IEEE Transactions on Software Engineering"}
X if ( journal ~ /IEEE Tran.*Soft.*Eng/ ) journal = "ieeese"
X # {ieeetc} {"IEEE Transactions on Computers"}
X if ( journal ~ /IEEE Tran.*Computers/ ) journal = "ieeetc"
X # {ieeetcad}
X if ( journal ~ /IEEE Tran.*Comp.*Desig/ ) journal = "ieeetcad"
X # {ipl} {"Information Processing Letters"}
X if ( journal ~ /Inf.*Proc.*Lett/ ) journal = "ipl"
X # {jacm} {"Journal of the ACM"}
X if ( journal ~ /Jou.* ACM/ ) journal = "jacm"
X if ( journal ~ /JACM/ ) journal = "jacm"
X # {jcss} {"Journal of Computer and System Sciences"}
X if ( journal ~ /J.*Comp.*Sys.*Sc/ ) journal = "jcss"
X # {scp} {"Science of Computer Programming"}
X if ( journal ~ /Sc.*Comp.*Prog/ ) journal = "scp"
X # {sicomp} {"SIAM Journal on Computing"}
X if ( journal ~ /SIAM .*Comp/ ) journal = "sicomp"
X # {tocs} {"ACM Transactions on Computer Systems"}
X if ( journal ~ /ACM Tran.*Comp.*Sys/ ) journal = "tocs"
X # {tods} {"ACM Transactions on Database Systems"}
X if ( journal ~ /ACM Tran.*Data.*Sys/ ) journal = "tods"
X # {tog} {"ACM Transactions on Graphics"}
X if ( journal ~ /ACM Tran.*Grap/ ) journal = "tog"
X # {toms} {"ACM Transactions on Mathematical Software"}
X if ( journal ~ /ACM Tran.*Math.*Soft/ ) journal = "toms"
X # {toois} {"ACM Transactions on Office Information Systems"}
X if ( journal ~ /ACM Tran.*Off.*Inf.*Sys/ ) journal = "toois"
X # {toplas} {"ACM Transactions on Programming Languages and Systems"}
X if ( journal ~ /ACM Tran.*Prog.*Lan.*Sys/ ) journal = "toplas"
X # {tcs} {"Theoretical Computer Science"}
X if ( journal ~ /Th.*Comp.*Sci/ ) journal = "tcs"
X
X }
X
X if(lwidth>0) { # split lines of potential over length
X # titles, book titles, notes, abstracts, addresses,
X # journals ( which may be conference proceedings) and institutions
X if(T){
X twc = split(title,z)
X title = z[1];
X lt = length(z[1])+13+length("Title")
X for(i=2;i<=twc;i++) {
X if(lt + length(z[i]) >= lwidth)
X {sc = "\n\t\t";lt = 15;}
X else sc = " ";
X title = title sc z[i]
X lt += length(z[i]) + 1
X }
X }
X
X if(J){ # it may be a conference
X twc = split(journal,z)
X journal = z[1];
X if(twc>1) { # if its 1 then we have an abbreviation
X lt = length(z[1])+11+length(jnl)
X for(i=2;i<=twc;i++) {
X if(lt + length(z[i]) >= lwidth)
X {sc = "\n\t\t";lt = 15;}
X else sc = " ";
X journal = journal sc z[i]
X lt += length(z[i]) + 1
X }
X }
X }
X
X if(B){
X twc = split(booktitle,z)
X booktitle = z[1];
X lt = length(z[1])+13+length("Booktitle")
X for(i=2;i<=twc;i++) {
X if(lt + length(z[i]) >= lwidth)
X {sc = "\n\t\t";lt = 15;}
X else sc = " ";
X booktitle = booktitle sc z[i]
X lt += length(z[i]) + 1
X }
X }
X
X if(C){
X twc = split(city,z)
X city = z[1];
X lt = length(z[1])+13+length("Address")
X for(i=2;i<=twc;i++) {
X if(lt + length(z[i]) >= lwidth)
X {sc = "\n\t\t";lt = 15;}
X else sc = " ";
X city = city sc z[i]
X lt += length(z[i]) + 1
X }
X }
X
X if(O){
X twc = split(other,z)
X other = z[1];
X lt = length(z[1])+13+length("Note")
X for(i=2;i<=twc;i++) {
X if(lt + length(z[i]) >= lwidth)
X {sc = "\n\t\t";lt = 15;}
X else sc = " ";
X other = other sc z[i]
X lt += length(z[i]) + 1
X }
X }
X
X if(R){
X twc = split(report,z)
X report = z[1];
X lt = length(z[1])+13+length( institute )
X for(i=2;i<=twc;i++) {
X if(lt + length(z[i]) >= lwidth)
X {sc = "\n\t\t";lt = 15;}
X else sc = " ";
X report = report sc z[i]
X lt += length(z[i]) + 1
X }
X }
X
X if(X){
X twc = split(abstr,z)
X abstr = z[1];
X lt = length(z[1])+13+length("Annote")
X for(i=2;i<=twc;i++) {
X if(lt + length(z[i]) >= lwidth)
X {sc = "\n\t\t";lt = 15;}
X else sc = " ";
X abstr = abstr sc z[i]
X lt += length(z[i]) + 1
X }
X }
X }
X
X # fiddle fields that might contain only digits
X if ( volume !~ /^ *[0-9][0-9]* *$/ ) volume = "{ " volume " }"
X if ( number !~ /^ *[0-9][0-9]* *$/ ) number = "{ " number " }"
X if ( pages !~ /^ *[0-9][0-9]* *$/ ) pages = "{ " pages " }"
X # print the reference
X
X bibs[bibtype] ++;
X printf "@%s{\t%s",bibtype,keys
X if(A) {
X printf ",\n\tAuthor = { %s",authors[1]
X for(i=2;i<=A;i++) printf " and\n\t\t%s",authors[i]
X printf " }"
X }
X if(TY) printf ",\n\tType = { %s }", type
X if(T) printf ",\n\tTitle = { %s }", title
X if(B) printf ",\n\tBooktitle = { %s }", booktitle
X if(E) {
X printf ",\n\tEditor = { %s", editor[1]
X for(i=2;i<=E;i++) printf " and\n\t\t%s", editor[i]
X printf " }"
X }
X if(J) printf ",\n\t%s = %s", jnl, journal
X if(S) printf ",\n\tSeries = { %s }", series
X if(V) printf ",\n\tVolume = %s", volume
X if(N) printf ",\n\tNumber = %s", number
X if(P) printf ",\n\tPages = %s", pages
X if(O) printf ",\n\tNote = { %s }", other
X if( edition != "" )
X printf ",\n\tEdition = { %s }", edition
X if(R) printf ",\n\t%s = { %s }", institute, report
X # Non standard fields start
X if(G) printf ",\n\tGovernmentNo = { %s }", govtorder
X if(M) printf ",\n\tBellLabsMemo = { %s }", bellmemo
X # Non standard end
X if(I) printf ",\n\tPublisher = { %s }", publisher
X if(C) printf ",\n\tAddress = { %s }", city
X if(month != "") printf ",\n\tMonth = %s", month
X if(D) printf ",\n\tYear = %s", year
X if(L) printf ",\n\tKey = { %s }", label
X if(K) printf ",\n\tKeywords = { %s }", keywords
X if(X) printf ",\n\tAnnote = { %s }", abstr
X printf "\t}\n\n"
X
X # initialise for next reference
X
X A=0;B=0;C=0;D=0;E=0;F=0;G=0;H=0;I=0;J=0;
X K=0;L=0;M=0;N=0;O=0;P=0;Q=0;R=0;S=0;T=0;
X U=0;V=0;W=0;X=0;Y=0;Z=0; TY=0;
X bibtype = misc
X edition = ""
X type = ""
X institute = ""
X booktitle = ""
X title = ""
X volume = ""
X alleditors = ""
X city = ""
X date = ""
X month = ""
X xmonth = ""
X kyear = ""
X year = ""
X publisher = ""
X journal = ""
X number = ""
X other = ""
X page = ""
X report = ""
X series = ""
X abstr = ""
X bellmemo = ""
X govtorder = ""
X keywords = ""
X label = ""
X keys = ""
X toterr +=err
X rx++
X }
X else if (comment) print "\t}\n"
X err = 0
X percent = 0 # not in a reference
X comment = 0 # not in a comment
X pctchar = ""
X next
X }
X
X/^%T/ {
X T ++
X if (T>1) {
X err=1
X print "Two titles: Reference " rx >> errfil
X print title >> errfil
X }
X title = entry
X next
X }
X
X/^%D/ {
X D ++
X if (D>1) {
X err=1
X print "Two dates: Reference " rx >> errfil
X print date >> errfil
X }
X if (($NF<1900)||($NF>=2000)) {
X err=1
X print "Date error? : Reference " rx >> errfil
X }
X date = entry
X next
X }
X
X/^%P/ {
X P ++
X if ( P>1 ) {
X err=1
X print "Two page nos? : Reference " rx >> errfil
X print pages >> errfil
X }
X pages = entry
X next
X }
X
X/^%J/ {
X J ++
X if ( J>1 ) {
X err=1
X print "Two journals: Reference " rx >> errfil
X print journal >> errfil
X }
X journal = entry
X next
X }
X
X/^%V/ {
X V ++
X if ( V>1 ) {
X err=1
X print "Two volumes: Reference " rx >> errfil
X print volume >> errfil
X }
X volume = entry
X next
X }
X
X/^%N/ {
X N ++
X if ( N>1 ) {
X err=1
X print "Two issue numbers: Reference " rx >> errfil
X print number >> errfil
X }
X number = entry
X next
X }
X
X/^[^%]/ { # non-blank-non-% lines
X
X if(FILENAME == keyfil )
X keyused[$1] = $2
X else
X if(percent)
X { # in the references
X
X if( pctchar == "A") authors[A] = authors[A] " " $0
X if( pctchar == "B") booktitle = booktitle " " $0
X if( pctchar == "C") city = city " " $0
X if( pctchar == "D") date = date " " $0
X if( pctchar == "E") alleditors = alleditors " " $0
X if( pctchar == "G") govtorder = govtorder " " $0
X if( pctchar == "H") header = header " " $0
X if( pctchar == "I") publisher = publisher " " $0
X if( pctchar == "J") journal = journal " " $0
X if( pctchar == "K") keywords = keywords " " $0
X if( pctchar == "L") label = label " " $0
X if( pctchar == "M") bellmemo = bellmemo " " $0
X if( pctchar == "N") number = number " " $0
X if( pctchar == "O") other = other " " $0
X if( pctchar == "P") pages = pages " " $0
X if( pctchar == "Q") authors[A] = authors[A] " " $0
X if( pctchar == "R") report = report " " $0
X if( pctchar == "S") series = series " " $0
X if( pctchar == "T") title = title " " $0
X if( pctchar == "V") volume = volume " " $0
X if( pctchar == "X") abstr = abstr " " $0
X }
X else {
X if (!comment) print "@Comment{"
X print $0
X comment = 1
X }
X next
X }
X
X/^%C/ {
X C ++
X if ( C>1 ) {
X err=1
X print "Two cities: Reference " rx >> errfil
X print city >> errfil
X }
X city = entry
X next
X }
X
X/^%I/ {
X I ++
X if ( I>1 ) {
X err=1
X print "Two publishers: Reference " rx >> errfil
X print publisher >> errfil
X }
X publisher = entry
X next
X }
X
X/^%B/ {
X B ++
X if ( B>1 ) {
X err=1
X print "Two books: Reference " rx >> errfil
X print booktitle >> errfil
X }
X booktitle = entry
X next
X }
X
X/^%E/ { # this really deals with 'bib' format which allows multiple
X # %E fields
X # refer only allows one %E field
X # we split it when the reference is printed
X E++
X if ( alleditors == "" ) alleditors = entry
X else alleditors = alleditors " and " entry
X next
X }
X
X/^%O/ {
X O ++
X if ( O>1 ) {
X err=1
X print "Two others: Reference " rx >> errfil
X print other >> errfil
X }
X other = entry
X next
X }
X
X/^%H/ {
X H ++
X if ( H>1 ) {
X err=1
X print "Two headers: Reference " rx >> errfil
X print header >> errfil
X }
X header = entry
X next
X }
X
X/^%S/ {
X S ++
X if ( S>1 ) {
X err=1
X print "Two series: Reference " rx >> errfil
X print series >> errfil
X }
X series = entry
X next
X }
X
X/^%R/ {
X R ++
X if ( R>1 ) {
X err=1
X print "Two reports: Reference " rx >> errfil
X print report >> errfil
X }
X report = entry
X next
X }
X
X/^%X/ {
X X ++
X abstr = entry
X if ( X>1 ) {
X err=1
X print "Two abstracts: Reference " rx >> errfil
X }
X next
X }
X
X/^%K/ {
X K++
X if (K>1) {
X err=1
X print "Two keywords: Reference " rx >> errfil
X print keywords >> errfil
X }
X keywords = entry
X next
X }
X
X/^%L/ {
X L++
X if (L>1) {
X err=1
X print "Two labels: Reference " rx >> errfil
X print label >> errfil
X }
X label = entry
X next
X }
X
X/^%G/ {
X G++
X if (G>1) {
X err=1
X print "Two Gov't order Nos: Reference " rx >> errfil
X print govtorder >> errfil
X }
X govtorder = entry
X next
X }
X
X/^%M/ {
X M++
X if (M>1) {
X err=1
X print "Two Bell Labs Memo Nos: Reference " rx >> errfil
X print bellmemo >> errfil
X }
X bellmemo = entry
X next
X }
X
X/^%/ { # should not get these
X F ++
X print "Unexpected flag: Reference " rx >> errfil
X print $0 >> errfil
X err = 1
X next
X }
X
XEND {
X print refs " references" >> errfil
X if(toterr) print toterr " erroneous" >> errfil
X if(bibs[article]>0) print bibs[article], " journal articles" >> errfil
X if(bibs[book]>0) print bibs[book], " books" >> errfil
X if(bibs[booklet]>0) print bibs[booklet], " booklets" >> errfil
X if(bibs[inbook]>0) print bibs[inbook], " book extracts" >> errfil
X if(bibs[incollection]>0) print bibs[incollection], " book articles" >> errfil
X if(bibs[inproceedings]>0) print bibs[inproceedings], " conference papers" >> errfil
X if(bibs[manual]>0) print bibs[manual], " manuals" >> errfil
X if(bibs[mastersthesis]>0) print bibs[mastersthesis], " Master's theses" >> errfil
X if(bibs[misc]>0) print bibs[misc], " miscellaneous" >> errfil
X if(bibs[phdthesis]>0) print bibs[phdthesis], " PhD theses" >> errfil
X if(bibs[proceedings]>0) print bibs[proceedings], " conference proceedings" >> errfil
X if(bibs[techreport]>0) print bibs[techreport], " technical reports" >> errfil
X if(bibs[unpublished]>0) print bibs[unpublished], " unpublished papers" >> errfil
X
X for(k in keyused) print k, keyused[k] > nkeyfil
X
X }
XZZ
X
X$DEBUG && echo "Generated: <$GEN>" 1>&2
X$DEBUG && echo "Width: <$WIDTH>" 1>&2
X$DEBUG && echo "Errors in: <$ERRFILE>" 1>&2
X$DEBUG && echo "Authors used in Keys: <$NAUTHOR>" 1>&2
X$DEBUG && echo "Characters/Author used in Keys: <$LAUTHOR>" 1>&2
X
X# Process each file, or if none given, standard input
Xfor FILE in ${*-$STDIN}
Xdo
X cp $KEYFILE $NKEYFILE
X
X# First set up shell variables as required
X if [ "$FILE" = "$STDIN" ]
X then
X NEWFILE=$NAME.$BIB
X else
X if [ -r "$FILE" -a -f "$FILE" ]
X then
X NAME=`basename $FILE`
X NEWFILE=$FILE.$BIB
X else
X NAME=""
X echo "$PROGNAME: Can't read $FILE" 1>&2
X fi
X fi
X
X# If all is OK, read input and terminate with a blank line.
X if [ "$NAME" ]
X then
X if [ "$FILE" = "$STDIN" ]
X then
X# If no files given, read from standard input.
X $DEBUG && echo "Reading from standard input" 1>&2
X cat
X echo
X else
X $DEBUG && echo "Reading <$FILE>" 1>&2
X cat $FILE
X echo
X fi |
X# do the conversions
X sed -f $SEDFILE |
X awk -f $AWKFILE $KEYFILE - |
X# Finally, output to named files or standard output
X if $NAMEDFILES
X then
X $DEBUG && echo "Output to <$NEWFILE>" 1>&2
X cat > $NEWFILE
X else
X cat
X fi
X fi
X cp $NKEYFILE $KEYFILE
Xdone
X
Xecho >> $ERRFILE Key Frequencies
Xsort < $KEYFILE >> $ERRFILE
X[ $AKEYFILE ] && [ -w $AKEYFILE ] && cat $KEYFILE > $AKEYFILE
Xexit 0
End-of-file
echo Extracting $PRGRM.1
sed -e 's/^X//' -e 's/prgrm/'$PRGRM'/' <<'End-of-file' >$PRGRM.1
X.TH prgrm 1-local
X.SH NAME
Xprgrm \- convert refer input files to bibtex .bib files
X.SH SYNOPSIS
X.B prgrm
X[options ...] [files ...]
X.br
X.SH DESCRIPTION
X.B prgrm
Xreads the
X.I files
Xand produces a
X.B bibtex
Xreference list (a .bib file) on the standard output.
XIf no files are given, prgrm reads
Xstandard input.
X.PP
XA rudimentary attempt is made to convert
X.I troff
Xspecial characters and accents to the equivalent
X.I TeX
Xones.
XThe file ``prgrm.errs'' contains complaints about references that were
Xnot recognised, and other problems, as well as a summary of the
Xnumber of conversions completed.
X.PP
XSince
X.B refer
Xfiles are inherently unstructured (compared to
X.B bibtex )
X.B prgrm
Xonly does a passable job. In particular
X.B refer
Xdoesn't require a keyword, while
X.B bibtex
Xdoes.
X.B prgrm
Xgenerates one using the following procedure:
Xthe first three characters of the last names of the first three authors
Xare concatenated, (preserving the capital letters), and the last two
Xdigits of the date are appended. If this key has already been used,
Xthen 'a', 'b', 'c', are appended as needed.
XThere is an optional facility to start the key useage where it left off
Xin some previous conversion, by supplying a file containing the keys used.
X.PP
XJournal entries that appear to be in the standard bibliography style
Xfiles list of @strings, are converted.
XThe %D field is converted to month and year entries if there are two
Xfields, otherwise it is assumed to contain only the year.
XA large number of proper names, such as Hilbert, Turing, etc.,
Xwhich are often found in the titles of articles are enclosed in braces
X{} to protect them. This treatment is also applied to any strings of
Xtwo or more consecutive capital letters, or to any isolated single
Xcapital letter (except A).
XThe user can supply an extra list of names to have their capitalisation
Xprotected.
X.PP
XTo determine the type of reference that the
X.B refer
Xentry is,
X.B prgrm
Xhas to do some ``calculated guessing''. The heuristic used
Xhere (again, in order of precedence) is:
X.PP
X1. If it has a journal entry (%J) then it's considered to
Xbe an @article, unless there is a city entry (%C) or a publisher entry
X(%I) as well, in which case it's treated as an @inproceedings.
XIf there is no volume or number (%V or %N) entry it will be considered a
Xconference proceedings.
X.PP
X2. If it has a book entry (%B) then it's considered to
Xbe an @incollection.
X.PP
X3. If it has a report entry (%R) then it's considered to
Xbe a @techreport.
XIf the %R field contains the word ``Dissertation'' or ``Thesis'',
Xthe classification will be @phdthesis or @mastersthesis.
XIf the %R field contains the word ``Manual'' then it will be classified
Xas a @manual, and if it contains ``Unpublished''
Xit will be classified as a @unpublished.
X.PP
X4. If it has a issuer entry (%I) then it's considered to
Xbe a @book.
X.PP
X5. Otherwise it's considered to be a @misc, or @unpublished.
XAll these entries are listed in the ``prgrm.errs'' file.
XThe decision to classify it as @unpublished is made if the
Xword ``unpublished'' appears in the %O field. This word is deleted.
X.PP
XQuite often
X.B prgrm
Xwill misguess and you will need to edit (by hand) the resulting .bib
Xfile.
X.PP
XAny fields that
X.B prgrm
Xdoesn't know about it will ignore (and complain about on stderr).
X.PP
XThe output is normally folded at word boundaries to ensure that
Xlines do not become too long on output.
XJournal entries that correspond to the abbreviations in the standard
Xbibliography styles are abbreviated, as are months mentioned in the date
Xfield.
X.PP
XNon blank lines that appear outside a reference are accumulated
Xand printed as a @comment section. In fact BiBTeX would ignore them
Xas refer does, but identifying them separately seems cleaner, and might
Xmake prgrm suitable for converting refer bibliographies to scribe
Xformat.
X.SH OPTIONS
XThe following options are available:
X.TP 10
X.BI \- num
XSpecify a maximum width for the output.
XThe default is 72 characters.
XIf
X.I num
Xis omitted then lines are not folded and may be of any length.
X.TP 10
X.BI \-e " file"
XUse the next argument as the file name in which to print the errors
Xand summary of output.
X.TP 10
X.BI \-kf " file"
XUse the next argument as a file name to read the current state of key
Xusage from, and to save the key usage data at the end of the conversion.
XThis allows extra databases to be converted, without having to convert
Xall the old ones, and keeps the keys unique over all the databases.
X.TP 10
X.BI \-k " n"
XUse the next argument to decide how many authors names to use in generating the
Xkey. (Default 3).
X.TP 10
X.BI \-l " n"
XUse the next argument to decide how many charcters from each authors name
Xto use in generating the key. (Default 3).
X.TP 10
X.B \-n
XUse the name of the input file(s) to produce output file(s) with
Xthe same name and extension ``.bib'' rather than sending the output
Xto standard output.
X.TP 10
X.B \-u
XDisplay the usage of the command.
X.TP 10
X.B \-w
XDo not fold the output. Lines may be of any length.
X.SH ACKNOWLEDGMENT
XThis manual page is based on the manual page for
X.I r2bib ,
Xa program which performs a simpler version of the same conversion,
Xwritten by
XRusty Wright, Center For Music Experiment, University of California San
XDiego.
XThe options and their processing is based on the
X.I ref2bib
Xwritten independently by Jonathan Bowen of the Programming Research Group,
XOxford University.
XA number of the heuristics are also copied from Jonathan Bowen's
Xscript.
X.SH AUTHOR
XPeter King, Computer Science Department, Heriot-Watt University,
XEdinburgh.
X.SH BUGS
XImplemented as a
X.I sh(1)
Xscript, using
X.I sed(1)
Xand
X.I awk(1) .
XThis makes the conversion very slow, but also means that it is easily
Xmodified to alter the heuristics. In particular, the key generation
Xalgorithm is easily changed.
X
XThe heuristics for identifying theses, unpublished papers, etc. are
Xrather crude.
End-of-file
echo End of archive
exit 0
-------