Benutzer:Andreas Plank/BASH

Kurze Zusammenfassungen wichtiger Empfehlungen:

Muth (Better Bash Scripting, 2012)
Vreckem (Bash best practices, 2020)

Dateinamen auffinden

?      # Genau ein beliebiges Zeichen
*      # Beliebig viele (auch 0) beliebige Zeichen
[def]  # Eines der Zeichen
[^def] # Keines der angegebenen Zeichen
[!def] # Wie oben
[a-d]  # Alle Zeichen aus dem Bereich
ls -d /[a-d]* # Verzeichnisse → /bin  /boot  /dev

Sortierte Dateien vergleichen

Inhalte beider Dateien anzeigen:

cat Datei_1.txt # sollte sortiert sein	cat Datei_2.txt # sollte sortiert sein
drei-beide eins-1 fünf -1 fünf-beide zwei-1	acht-2 drei-beide fünf-beide sechs-2 zweilerlei-2 zweitens-2

Standardmäßige Ausgabe:

comm Datei_1.txt Datei_2.txt # es werden 3 Spalten ausgegeben

        acht-2
                drei-beide
eins-1
fünf  -1
                fünf-beide
        sechs-2
zwei-1
        zweilerlei-2
        zweitens-2

Es bedeuten:

Spalte 1: Ergebnis einzig aus Datei_1.txt
Spalte 2: Ergebnis einzig aus Datei_2.txt
Spalte 3: Ergebnis aus beiden Dateien

Das Kommando comm kann nun diese drei Ausgabespalten vermittels Option unterdrücken:

comm -1 unterdrücke Ausgabespalte 1 (Ergebnis übrig: Datei_2 + beide)
comm -12 unterdrücke Ausgabespalten 1 + 2 (Ergebnis übrig: aus beiden)
comm -13 unterdrücke Ausgabespalten 1 + 3 (Ergebnis übrig: Einziges aus Datei_2)
usw.

comm -23 Datei_1.txt Datei_2.txt 
# unterdrücke Ausgabespalte 2 + 3, erübrige Spalte 1, ergibt Einziges aus Datei_1

eins-1
fünf  -1
zwei-1

comm -13 Datei_1.txt Datei_2.txt 
# unterdrücke Ausgabespalte 1 + 3, erübrige Spalte 2, ergibt Einziges aus Datei_2

acht-2
sechs-2
zweilerlei-2
zweitens-2

comm -12 Datei_1.txt Datei_2.txt 
# unterdrücke Ausgabespalte 1 + 2, erübrige Spalte 3, ergibt aus beiderlei: Datei_1 und Datei_2

drei-beide
fünf-beide

Compare Two URL Lists

Assume to have two lists of URLs, one old and one new, and you want to get only those URLs that are actually new compared to the old list. The following example asumes to have CSV (comma separated values) or TSV (tab separated values) and tries to extract the very URL, regardless of any text after the URL.

For this we use command:
comm ‹-options› oldlist_sorted comparelist_sorted or
comm ‹-options› file_1_sorted file_2_sorted and this results in 3 output columns:

column-1         column-2        column-3
only-in-file_1
                 only-in-file_2
                                 in-file_1-and-2

… so using command comm you can now suppress one or two of these three output columns using the option:

comm -1 suppress output column 1 (results left: col 2 and 3, i.e. only-of-file_2 + both of in file_1-and-2)
comm -12 suppress output columns 1 + 2 (results left: col 3, i.e. from both of in file_1-and-2)
comm -13 suppress output columns 1 + 3 (results left: col 2, i.e. results only of in file_2)
comm -23 suppress output columns 2 + 3 (results left: col 1, i.e. results only of in file_1)
aso.

# # # # # # # # # # # # # #
# Check for URI-Differences old list vs. new list (in general for CSV or TSV lists)
# ```bash
# comm file1.txt file2.txt 
# LIST1-only-of-file1 LIST2-only-of-file2 LIST3-both-in-and-of-file1-file2
#
# comm -13 donelistsorted comparelistsorted > todolistsorted
# comm -13 donelistsorted comparelistsorted > todolistsorted
donelist_source=urilist_Naturalis_20220516.csv;
donelist_sorted=${donelist_source%.*}_sorted.tsv;
donelist_sorted_noprotocol=${donelist_source%.*}_sorted_noprotocol.tsv;

comparelist_source=urilist_Naturalis_20220817.tsv;
comparelist_sorted=${comparelist_source%.*}_sorted.tsv;
comparelist_sorted_noprotocol=${comparelist_source%.*}_sorted_noprotocol.tsv;

todolist_sorted=${comparelist_source%.*}_todo.tsv;
todolist_sorted_noprotocol=${comparelist_source%.*}_todo_noprotocol.tsv;

# assume CSV (comma separated values) or TSV (tab separated values)
# assume to have URLs beginning at the line start and after it (word-boundary), any other text herein after gets ignored

# compare by removing any protocol part (http:// https:// ftp:// sftp:// aso. OR remove <…>)
# without protocol
  sed --silent --regexp-extended '/[[:alpha:]]+:\/\// { s@[[:space:]]*<?[[:alpha:]]+://([^[:space:],]+)>?\b.*$@\1@; p }'  "$donelist_source"    | sort > "$donelist_sorted_noprotocol"
  sed --silent --regexp-extended '/[[:alpha:]]+:\/\// { s@[[:space:]]*<?[[:alpha:]]+://([^[:space:],]+)>?\b.*$@\1@; p }'  "$comparelist_source" | sort > "$comparelist_sorted_noprotocol"
  # only in done-list
  comm -23 "$donelist_sorted_noprotocol" "$comparelist_sorted_noprotocol" > "$todolist_sorted_noprotocol";
  # only in compare-list
  # comm -13 "$donelist_sorted_noprotocol" "$comparelist_sorted_noprotocol" > "$todolist_sorted_noprotocol";
  grep --count "/" "$todolist_sorted_noprotocol";
# with protocol
  sed --silent --regexp-extended '/[[:alpha:]]+:\/\// { s@[[:space:]]*<?([[:alpha:]]+://[^[:space:],]+)>?\b.*$@\1@; p }'  "$donelist_source"    | sort > "$donelist_sorted"
  sed --silent --regexp-extended '/[[:alpha:]]+:\/\// { s@[[:space:]]*<?([[:alpha:]]+://[^[:space:],]+)>?\b.*$@\1@; p }'  "$comparelist_source" | sort > "$comparelist_sorted"
  # only in done-list
  comm -23 "$donelist_sorted" "$comparelist_sorted" > "$todolist_sorted";
  # only in compare-list
  # comm -13 "$donelist_sorted" "$comparelist_sorted" > "$todolist_sorted";
  grep --count "/" "$todolist_sorted";
# ```

Summieren von Zahlen, Listen

# wir wollen die Spalte der Dateigrößen zusammenrechnen 1798891, 2804087 usw.
# Abhängigkeit: awk (verarbeite Textfelder und -dateien)
# Abhängigkeit: bc (eine Rechensprache für beliebige Genauigkeit)
ls -l *importsplit* | head -n 5 
# -rw-r--r-- 1 myusername myusername 1798891 Jun 30 16:32 Thread-01_botanicalcollections.be_20220509-1108_importsplit_01.rdf._normalized.ttl.trig.gz
# -rw-r--r-- 1 myusername myusername 2804087 Jun 30 16:32 Thread-01_botanicalcollections.be_20220509-1546_importsplit_01.rdf._normalized.ttl.trig.gz
# -rw-r--r-- 1 myusername myusername  862051 Jun 30 16:32 Thread-01_botanicalcollections.be_20220509-1546_importsplit_02.rdf._normalized.ttl.trig.gz
# -rw-r--r-- 1 myusername myusername 2276286 Jun 30 16:32 Thread-01_botanicalcollections.be_20220511-1106_importsplit_01.rdf._normalized.ttl.trig.gz
# -rw-r--r-- 1 myusername myusername  692749 Jun 30 16:32 Thread-01_botanicalcollections.be_20220511-1106_importsplit_02.rdf._normalized.ttl.trig.gz
ls -l *importsplit* | awk '{ print $5 }' | paste --serial --delimiters=+ - | bc
# 362150562

BASH kurz-Optionen zu lang-Optionen ersetzen (automatisiert aus Handbuch-Dokumentation (man pages))

# aus dem Handbuch von `iptables` die Optionen …
# [!] -k, --kurz-ausgeschriebeoption als auch …
#     -k, --kurz-ausgeschriebeoption …
# … herausgreifen und ein sed-Kommando daraus machen und hübsch in Spaltendarstellung
  man iptables | grep -i --extended-regexp -- '^([[:space:]]*|[[:space:]]*[\[\]!]*[[:space:]]*)-[[:digit:][:alpha:]],[[:space:]]*--' \
     | sed --regexp-extended 's/.*(-[[:alnum:]]),[[:space:]](--[[:alnum:]]+-?[[:alnum:]]+\b).*/s@ \1 @ \2 @g; #§marker§ &/g;' \
     | sort --ignore-case | column -s '#' -t | sed 's@§marker§@#@'

# Usage sed --file=short-options2long-options4rules.v4.sed rules.v4 # read changes on the screen
# Usage sed --in-place --file=short-options2long-options4rules.v4.sed rules.v4 # replace without backup
# Usage sed --in-place=.backup_20220802 --file=short-options2long-options4rules.v4.sed   # replace with backup: rules.v4.backup_20220802
#   man iptables | grep -i --extended-regexp -- '^([[:space:]]*|[[:space:]]*[\[\]!]*[[:space:]]*)-[[:digit:][:alpha:]],[[:space:]]*--' \
#      | sed -r 's/.*(-[[:alnum:]]),[[:space:]](--[[:alnum:]]+-?[[:alnum:]]+\b).*/s@ \1 @ \2 @g; # &/g;' \
#      | sort --ignore-case
#   man iptables | grep -i --extended-regexp -- '^([[:space:]]*|[[:space:]]*[\[\]!]*[[:space:]]*)-[[:digit:][:alpha:]],[[:space:]]*--' \
#      | sed -r 's/.*(-[[:alnum:]]),[[:space:]](--[[:alnum:]]+-?[[:alnum:]]+\b).*/s@ \1 @ \2 @g; #§marker§ &/g;' \
#      | sort --ignore-case | column -s '#' -t | sed 's@§marker§@#@'
s@ -4 @ --ipv4 @g;            #        -4, --ipv4
s@ -6 @ --ipv6 @g;            #        -6, --ipv6
s@^-A @--append @g;
s@ -A @ --append @g;          #        -A, --append chain rule-specification
s@ -C @ --check @g;           #        -C, --check chain rule-specification
s@ -c @ --set-counters @g;    #        -c, --set-counters packets bytes
s@ -D @ --delete @g;          #        -D, --delete chain rulenum ... -D, --delete chain rule-specification
s@ -d @ --destination @g;     #        [!] -d, --destination address[/mask][,...]
s@ -E @ --rename-chain @g;    #        -E, --rename-chain old-chain new-chain
s@ -F @ --flush @g;           #        -F, --flush [chain]
s@ -f @ --fragment @g;        #        [!] -f, --fragment
s@ -g @ --goto @g;            #        -g, --goto chain
s@ -i @ --in-interface @g;    #        [!] -i, --in-interface name
s@ -I @ --insert @g;          #        -I, --insert chain [rulenum] rule-specification
s@ -j @ --jump @g;            #        -j, --jump target
s@ -L @ --list @g;            #        -L, --list [chain]
s@ -m @ --match @g;           #        -m, --match match
s@ -N @ --new-chain @g;       #        -N, --new-chain chain
s@ -n @ --numeric @g;         #        -n, --numeric
s@ -o @ --out-interface @g;   #        [!] -o, --out-interface name
s@ -P @ --policy @g;          #        -P, --policy chain target
s@ -p @ --protocol @g;        #        [!] -p, --protocol protocol
s@ -R @ --replace @g;         #        -R, --replace chain rulenum rule-specification
s@ -S @ --list-rules @g;      #        -S, --list-rules [chain]
s@ -s @ --source @g;          #        [!] -s, --source address[/mask][,...]
s@ -t @ --table @g;           #        -t, --table table
s@ -v @ --verbose @g;         #        -v, --verbose
s@ -V @ --version @g;         #        -V, --version
s@ -w @ --wait @g;            #        -w, --wait [seconds]
s@ -X @ --delete-chain @g;    #        -X, --delete-chain [chain]
s@ -x @ --exact @g;           #        -x, --exact
s@ -Z @ --zero @g;            #        -Z, --zero [chain [rulenum]]

Redirect Errors/Standard Output

Siehe: https://www.thomas-krenn.com/de/wiki/Bash_stdout_und_stderr_umleiten

Funktion	Bash redirection
stdout -> Datei umleiten	`programm > Datei.txt`
stderr -> Datei umleiten	`programm 2> Datei.txt`
stdout UND stderr -> Datei umleiten	`programm &> Datei.txt`
stdout -> Datei umleiten UND stderr -> Datei umleiten	`programm > Datei_stdout.txt 2> Datei_stderr.txt`
stdout -> stderr	`programm 1>&2`
stderr -> stdout	`programm 2>&1`

Parameter Substitution

echo {a,b}{1,2,3} # a1 a2 a3 b1 b2 b3
# Inhalt von Archiven vergleichen
  diff <(tar tzf Buch1.tar.gz) <(tar tzf Buch.tar.gz)

d='message'
echo $d   # → message
echo ${d} # → message
# d may be not set but a local default (no definition!)
  echo ${d-default} # → default
  echo ${d-'*'}     # → *
  echo ${d-$1}      # → output of d or the first parameter given 
# d may be not set but a default definition
  echo ${d=default}
# no d + default given, but a message and procedure is than abandoned:
  echo ${d?message}
# A shell procedure that requires some parameters to be set might start as follows:
  : ${user?} ${acct?} ${bin?}
  # will print something like: "bash: user: Parameter ist Null oder nicht gesetzt."

${string/substring/replacement} # replaces the first match
${string//substring/replacement} # replaces all matches
${string#substring} # Deletes shortest match of $substring from front of $string.
${string##substring} # Deletes longest match of $substring from front of $string.
${string%substring} # Deletes shortest match of $substring from back of $string.
${string%%substring} # Deletes longest match of $substring from back of $string.

Command substitution

# commands in `...`
echo `pwd` # → /home/myusername → is the current working directory

ls `echo "$1"`
# is the same as
ls $1

set `date`; echo $6 $2 $3, $4 # → 2010 7. Dez, 17:28:44

for i in `ls -t`; do ... # list in time order (ls -t)

Extended mode

# the shell option extglob must be activated
help shopt # print help
shopt extglob
# extglob         off
shopt -s extglob; shopt extglob
# extglob         on
#shopt -u extglob; shopt extglob
## extglob         off

?(a|b|c) # Keine oder eine der eingeschlossenen Zeichenketten
*(a|b|c) # Keine oder mehrere der eingeschlossenen Zeichenketten
+(a|b|c) # Eine oder mehrere der eingeschlossenen Zeichenketten
@(a|b|c) # Genau eine der eingeschlossenen Zeichenketten
!(a|b|c) # Alle außer den eingeschlossenen Zeichenketten

# list all directory names, beginning with "bi", "*+" or "us" 
ls -d /+(bi|*+|us)*
# /bin  /lost+found  /usr  

# list all directory names, beginning not with "b*" and the 2nd character has no "o"
ls -d /!(b*|?o*)
# /cdrom  /dev  /etc  /floppy  /lib  /mnt  /opt  /proc  /sbin  /tmp  /usr  /var

Substrings

string="0123456789stop"
echo ${string:7}      # 789stop
echo ${string:0:7}    # 0123456
echo ${string:-10000} # 0123456789stop
echo ${string: -5}    # 9stop

For Loops

#!/bin/bash
# normalerweise ist IFS=" \t\n" aber Problem in for, weil Leerzeichen falsche Trennung erzeugt
OLDIFS=$IFS
IFS=$'\n'
for datei in *.{jpg,jpeg,JPG,JPEG};do
  if [ -e "$datei" ];then
    echo "$datei (jpg > png) …";
    convert "$datei" "${datei%.*}.png" 
  fi
done
IFS=$OLDIFS

Useful Commands

# sort ls listing by domain Thread-…_wu.jacq.org_ rather than numeric by Thread-01…
file_pattern="Thread-*_gat.jacq.org*2022*-[0-9][0-9][0-9][0-9]_modified.rdf.gz"
ls $file_pattern | sed -r 's@(Thread-)[0-9]+_(.+)@& \2@' | sort -k 2 | sed -r 's@^([^[:space:]]+) .+$@\1@;'

stat --printf="# \e[32mfile name %n\e[0m (%s bytes)…" file
stat --printf="# \e[32mfile name %n\e[0m was modified %y" file
stat --format='%y' file | grep --only-matching --extended-regexp '^[[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}'

sort

Sorting of URLs but by domain regardless of which protocol there is (http, https, ftp aso.):

sort -t '/' -k3.1b urilist_JACQ_20220815_todo_sorted.tsv # sort by (t)able-field-character “/” 
  # -k*3*. 1 b  → set (k)ey field to sort after field 3, and from this position 
  # -k 3 .*1*b  → start sorting from the very 1st character to line end as being relevant for sorting
  # -k 3 . 1*b* → ignore (b)lanks
  sort --debug -t '/' -k3.1b urilist_JACQ_20220815_todo_sorted.tsv | head -n 6 # show what it is sorting actually

  # https://admont.jacq.org/ADMONT100002 
  # ____________________________________ for sort -t '/' -k1.1 --debug
  #        _____________________________ for sort -t '/' -k2.1 --debug
  #         ____________________________ for sort -t '/' -k3.1 --debug
  #                         ____________ for sort -t '/' -k4.1 --debug

jq ~ Get Markdown Table from Dynamic Data Values

(see the data in the hidden box, click right)

{ "head": {
    "vars": [ "cspp_example" , "institutionID" , "publisher" , "graph" ]
  } ,
  "results": {
    "bindings": [
      { 
        "cspp_example": { "type": "uri" , "value": "http://coldb.mnhn.fr/catalognumber/mnhn/p/p00039900" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/03wkt5x30" } ,
        "publisher": { "type": "uri" , "value": "https://science.mnhn.fr/institution/mnhn/collection/p/item/search" } ,
        "graph": { "type": "uri" , "value": "http://coldb.mnhn.fr/catalognumber/mnhn/p/" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "http://data.biodiversitydata.nl/naturalis/specimen/113251" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/0566bfb96" } ,
        "graph": { "type": "uri" , "value": "http://data.biodiversitydata.nl/naturalis/specimen/" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "http://id.herb.oulu.fi/0014586" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/03yj89h83" } ,
        "publisher": { "type": "literal" , "value": "http://gbif.fi" } ,
        "graph": { "type": "uri" , "value": "http://tun.fi" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "http://id.snsb.info/snsb/collection/1000/1579/1000" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/05th1v540" } ,
        "publisher": { "type": "literal" , "value": "http://www.snsb.info" } ,
        "graph": { "type": "uri" , "value": "http://id.snsb.info/snsb/" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "http://lagu.jacq.org/object/AM-02278" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/01j60ss54" } ,
        "publisher": { "type": "literal" , "value": "LAGU" } ,
        "graph": { "type": "uri" , "value": "http://lagu.jacq.org/object" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "http://specimens.kew.org/herbarium/K000989827" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/00ynnr806" } ,
        "publisher": { "type": "uri" , "value": "https://www.kew.org" } ,
        "graph": { "type": "uri" , "value": "http://specimens.kew.org/herbarium/" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "http://tbi.jacq.org/object/TBI1014287" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/051qn8h41" } ,
        "publisher": { "type": "literal" , "value": "TBI" } ,
        "graph": { "type": "uri" , "value": "http://tbi.jacq.org/object" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "http://tun.fi/MHD.107807" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/03tcx6c30" } ,
        "publisher": { "type": "literal" , "value": "http://gbif.fi" } ,
        "graph": { "type": "uri" , "value": "http://tun.fi" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "http://tun.fi/MKA.342315" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/05vghhr25" } ,
        "publisher": { "type": "literal" , "value": "http://gbif.fi" } ,
        "graph": { "type": "uri" , "value": "http://tun.fi" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "http://tun.fi/MKA.863532" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/029pk6x14" } ,
        "publisher": { "type": "literal" , "value": "http://gbif.fi" } ,
        "graph": { "type": "uri" , "value": "http://tun.fi" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://admont.jacq.org/ADMONT100680" } ,
        "institutionID": { "type": "uri" , "value": "http://viaf.org/viaf/128466393" } ,
        "publisher": { "type": "literal" , "value": "ADMONT" } ,
        "graph": { "type": "uri" , "value": "http://admont.jacq.org" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://bak.jacq.org/BAK0-0000001" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/006m4q736" } ,
        "publisher": { "type": "literal" , "value": "BAK" } ,
        "graph": { "type": "uri" , "value": "http://bak.jacq.org" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://boz.jacq.org/BOZ000001" } ,
        "institutionID": { "type": "uri" , "value": "http://viaf.org/viaf/128699910" } ,
        "publisher": { "type": "literal" , "value": "BOZ" } ,
        "graph": { "type": "uri" , "value": "http://boz.jacq.org" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://brnu.jacq.org/BRNU000205" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/02j46qs45" } ,
        "publisher": { "type": "literal" , "value": "BRNU" } ,
        "graph": { "type": "uri" , "value": "http://brnu.jacq.org" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://data.rbge.org.uk/herb/E00000001" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/0349vqz63" } ,
        "publisher": { "type": "uri" , "value": "http://www.rbge.org.uk" } ,
        "graph": { "type": "uri" , "value": "http://data.rbge.org.uk/herb/" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://dr.jacq.org/DR000023" } ,
        "institutionID": { "type": "uri" , "value": "http://viaf.org/viaf/155418159" } ,
        "publisher": { "type": "literal" , "value": "DR" } ,
        "graph": { "type": "uri" , "value": "http://dr.jacq.org" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://ere.jacq.org/ERE0000012" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/05mpgew40" } ,
        "publisher": { "type": "literal" , "value": "ERE" } ,
        "graph": { "type": "uri" , "value": "http://ere.jacq.org" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://gat.jacq.org/GAT0000014" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/02skbsp27" } ,
        "publisher": { "type": "literal" , "value": "GAT" } ,
        "graph": { "type": "uri" , "value": "http://gat.jacq.org" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://gjo.jacq.org/GJO0000012" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/00nxtmb68" } ,
        "publisher": { "type": "literal" , "value": "GJO" } ,
        "graph": { "type": "uri" , "value": "http://gjo.jacq.org" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://gzu.jacq.org/GZU000000208" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/01faaaf77" } ,
        "publisher": { "type": "literal" , "value": "GZU" } ,
        "graph": { "type": "uri" , "value": "http://gzu.jacq.org" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://hal.jacq.org/HAL0053120" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/05gqaka33" } ,
        "publisher": { "type": "literal" , "value": "HAL" } ,
        "graph": { "type": "uri" , "value": "http://hal.jacq.org" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://herbarium.bgbm.org/object/B100000004" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/00bv4cx53" } ,
        "publisher": { "type": "literal" , "value": "BGBM" } ,
        "graph": { "type": "uri" , "value": "http://herbarium.bgbm.org/object/" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://id.smns-bw.org/smns/collection/275449/772800/279829" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/05k35b119" } ,
        "graph": { "type": "uri" , "value": "http://id.smns-bw.org/smns/collection/" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://je.jacq.org/JE00000020" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/05qpz1x62" } ,
        "publisher": { "type": "literal" , "value": "JE" } ,
        "graph": { "type": "uri" , "value": "http://je.jacq.org" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://kiel.jacq.org/KIEL0007010" } ,
        "institutionID": { "type": "uri" , "value": "http://viaf.org/viaf/239180770" } ,
        "publisher": { "type": "literal" , "value": "KIEL" } ,
        "graph": { "type": "uri" , "value": "http://kiel.jacq.org" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://lz.jacq.org/LZ161177" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/03s7gtk40" } ,
        "publisher": { "type": "literal" , "value": "LZ" } ,
        "graph": { "type": "uri" , "value": "http://lz.jacq.org" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://mjg.jacq.org/MJG000015" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/023b0x485" } ,
        "publisher": { "type": "literal" , "value": "MJG" } ,
        "graph": { "type": "uri" , "value": "http://mjg.jacq.org" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://pi.jacq.org/PI000648" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/03ad39j10" } ,
        "publisher": { "type": "literal" , "value": "PI" } ,
        "graph": { "type": "uri" , "value": "http://pi.jacq.org" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://prc.jacq.org/PRC2535" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/024d6js02" } ,
        "publisher": { "type": "literal" , "value": "PRC" } ,
        "graph": { "type": "uri" , "value": "http://prc.jacq.org" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://tub.jacq.org/TUB002830" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/03a1kwz48" } ,
        "publisher": { "type": "literal" , "value": "TUB" } ,
        "graph": { "type": "uri" , "value": "http://tub.jacq.org" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://ubt.jacq.org/UBT0010195" } ,
        "institutionID": { "type": "uri" , "value": "http://viaf.org/viaf/142509930" } ,
        "publisher": { "type": "literal" , "value": "UBT" } ,
        "graph": { "type": "uri" , "value": "http://ubt.jacq.org" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://w.jacq.org/W0000011a" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/01tv5y993" } ,
        "publisher": { "type": "literal" , "value": "W" } ,
        "graph": { "type": "uri" , "value": "http://w.jacq.org" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://wu.jacq.org/WU0000004" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/03prydq77" } ,
        "publisher": { "type": "literal" , "value": "WU" } ,
        "graph": { "type": "uri" , "value": "http://wu.jacq.org" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://www.botanicalcollections.be/specimen/BR0000005065868" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/01h1jbk91" } ,
        "graph": { "type": "uri" , "value": "http://botanicalcollections.be/specimen/" }
      }
    ]
  }
}

These data results have also missing values, the point is to extract .head.vars and use those values in the variable $fields to query all values out of .results.bindings[].[].value, the rest is getting a nice looking:

# table head (and adding an index column)
cat institutionID_20220808.json  | jq --raw-output '.head.vars | @tsv' | sed --regexp-extended 's@^@| # | @; s@$@ |@; s@[\t]@ | @g; h; s@[^|]@-@g;x;G;'
cat institutionID_20220808.json  | jq -r '.head.vars | @tsv' | sed -r 's@^@| # | @; s@$@ |@; s@[\t]@ | @g; h; s@[^|]@-@g;x;G;'
# sed: s@^@| # | @;  add index column-header | # | to line start
# sed: s@$@ |@;      append closing table row at the end |
# sed: s@[\t]@ | @g; as it has tab separated values, replace \t by | (the colums or table data cells)
# sed: h;            put this ready formatted header into hold space buffer
# sed: s@[^|]@-@g;   replace all but “|” by “-” to make the markdown header separation
# sed: x;            exchange hold space buffer (formatted 1st table row) with the markdown header separation
# sed: G;            now we have only the first table row in place and append (G) the markdown header separation by a \n 
#                    and get a nice complete table header:
# | # | cspp_example | institutionID | publisher | graph |
# |---|--------------|---------------|-----------|-------|

# table body
  # understand table data but sort them by another column (use sort --debug to find out)
  cat institutionID_20220808.json  | jq -r '.head.vars as $fields | .results.bindings[] |  [.[($fields[])].value] |@tsv'
    # show tab separator output
  
  # we use “|” as colum separators and also to format using “|” for the column command
  cat institutionID_20220808.json  | jq --raw-output '.head.vars as $fields | .results.bindings[] |  [.[($fields[])].value] |@tsv' \
    | sed --regexp-extended 's@^@| @; s@$@ |@; s@[\t]@ | @g;' | column --table --separator '|' --output-separator '|' | sort --field-separator='|' --key=5.1b --debug 
  # short options
  cat institutionID_20220808.json  | jq -r '.head.vars as $fields | .results.bindings[] |  [.[($fields[])].value] |@tsv' \
    | sed -r 's@^@| @; s@$@ |@; s@[\t]@ | @g;' | column -t -s '|' -o '|' | sort -t '|' -k5.1b --debug 
    # sort -t '|' -k5.1b using | as table sort separator, and based on that sort the 5th field, use 1st character in the 5th field to line end,
    # -k5.1b ignore b(lanks)
    # -k5.1Vb version sort, ignore b(lanks)
    # -k5.1n natural sort (aso.)

  # table body (and adding an index column (sed -r "=;"))
  # short options
  cat institutionID_20220808.json | jq --raw-output '.head.vars as $fields | .results.bindings[] |  [.[($fields[])].value] |@tsv' \
    | sed --regexp-extended 's@^@| @; s@$@ |@; s@[\t]@ | @g;' | column --table --separator '|' --output-separator '|' | sort --field-separator='|' --key=5.1b \
    | sed "=" | sed --regexp-extended "/^[[:digit:]]/{ N; s@(^[[:digit:]]+)\n@| \1 @; }"
  cat institutionID_20220808.json | jq -r '.head.vars as $fields | .results.bindings[] |  [.[($fields[])].value] |@tsv' \
    | sed -r 's@^@| @; s@$@ |@; s@[\t]@ | @g;' | column -t -s '|' -o '|' | sort -t '|' -k5.1b \
    | sed "=" | sed -r "/^[[:digit:]]/{ N; s@(^[[:digit:]]+)\n@| \1 @; }"
  # | 1 | https://admont.jacq.org/ADMONT100680                           | http://viaf.org/viaf/128466393   | ADMONT | http://admont.jacq.org                     |
  # | 2 | https://bak.jacq.org/BAK0-0000001                              | https://ror.org/006m4q736        | BAK    | http://bak.jacq.org                        |
  # | 3 | https://www.botanicalcollections.be/specimen/BR0000005065868   | https://ror.org/01h1jbk91        |        | http://botanicalcollections.be/specimen/   |
  # | … | …                                                              | …                                | …      | …                                          |

Sed (kurz für stream editor)

Anleitungen - https://snipcademy.com/shell-scripting-sed – gute Schemadarstellungen der Kommandoabfolgen

Functions for BASH-Programming

comment_exit_code() {
    # unused
    # -------------------------------
    # Usage:
    #   comment_exit_code $exit_code
    #   comment_exit_code $exit_code "Some more exact comment what was done"
    # -------------------------------
    local this_exit_code=$1
    local this_comment=${2-}

    case $this_exit_code in [1-9]|[1-9][0-9]|[1-9][0-9][0-9])
      if [[ "${#this_comment}" -lt 1 ]];then 
      echo -e "${ORANGE}Something unexpected happend. Exit Code: ${this_exit_code} $(kill -l $this_exit_code)${NOFORMAT}" 
      else
      echo -e "${ORANGE}Something unexpected happend: ${this_comment}. Exit Code: ${this_exit_code} $(kill -l $this_exit_code)${NOFORMAT}" 
      fi
      ;;
    esac
}

repeat_text() {
    # -------------------------------
    # Usage:
    #   repeat_text n-times text
    #   repeat_text 10 '.'
    #     prints 10 dots: ..........
    #   repeat_text 10 '.' storingvariablename
    #     stores 10 dots to $storingvariablename
    # -------------------------------
    # $1=number of patterns to repeat
    # $2=pattern
    # $3=output variable name
    local tmp
    local local_1=$1
    local local_2=$2
    local local_3=${3-}
    printf -v tmp '%*s' "$local_1"
    if [[ "$local_3" ]];then
      printf -v "$local_3" '%s' "${tmp// /$local_2}"
    else
      printf '%s' "${tmp// /$local_2}"
    fi
}

setup_colors() {
  # 0 - Normal Style; 1 - Bold; 2 - Dim; 3 - Italic; 4 - Underlined; 5 - Blinking; 7 - Reverse; 8 - Invisible;
  if [[ -t 2 ]] && [[ -z "${NO_COLOR-}" ]] && [[ "${TERM-}" != "dumb" ]]; then
    NOFORMAT='\033[0m' 
    BOLD='\033[1m' ITALIC='\033[3m'
    BLUE='\033[0;34m' BLUE_BOLD='\033[1;34m' BLUE_ITALIC='\033[3;34m' 
    CYAN='\033[0;36m' CYAN_BOLD='\033[1;36m' CYAN_ITALIC='\033[3;36m' 
    GREEN='\033[0;32m' GREEN_BOLD='\033[1;32m' GREEN_ITALIC='\033[3;32m' 
    ORANGE='\033[0;33m' ORANGE_BOLD='\033[1;33m' ORANGE_ITALIC='\033[3;33m' 
    PURPLE='\033[0;35m' PURPLE_BOLD='\033[1;35m' PURPLE_ITALIC='\033[3;35m' 
    RED='\033[0;31m' RED_BOLD='\033[1;31m' RED_ITALIC='\033[3;31m' 
    YELLOW='\033[1;33m' YELLOW_BOLD='\033[1;33m' YELLOW_ITALIC='\033[3;33m'
  else
    NOFORMAT='' 
    BOLD='' ITALIC=''
    BLUE='' BLUE_BOLD='' BLUE_ITALIC='' 
    CYAN='' CYAN_BOLD='' CYAN_ITALIC='' 
    GREEN='' GREEN_BOLD='' GREEN_ITALIC='' 
    ORANGE='' ORANGE_BOLD='' ORANGE_ITALIC='' 
    PURPLE='' PURPLE_BOLD='' PURPLE_ITALIC='' 
    RED='' RED_BOLD='' RED_ITALIC='' 
    YELLOW='' YELLOW_BOLD='' YELLOW_ITALIC=''
  fi
}
setup_colors

test_dependencies() {
  local exit_level=0

  if ! [[ -e "${working_directory}/${file_input}" ]];then
    echo -e "${ORANGE}# We can not find the data in ${NOFORMAT}${working_directory}/${file_input}${ORANGE} (stop)${NOFORMAT}";
    exit_level=1;
  fi
  if ! [[ -x "$(command -v $bin_dwcagent)" ]]; then
    printf "${ORANGE}Command${NOFORMAT} $bin_dwcagent ${ORANGE} to parse names were not found. See https://libraries.io/rubygems/dwc_agent${NOFORMAT}\n"; exit_level=1;
  fi
  if ! [[ -x "$(command -v awk)" ]]; then
    printf "${ORANGE}Command${NOFORMAT} awk ${ORANGE} to read the data was not found. Please install it in your software management system.${NOFORMAT}\n"; exit_level=1;
    exit_level=1;
  fi
  case $exit_level in [1-9]) 
    printf "${ORANGE}(stop)${NOFORMAT}\n"; 
    exit 1;; 
  esac
}
test_dependencies

processinfo() {
  echo -e "${GREEN}# ---------------------------- ${NOFORMAT}"
  echo -e "${GREEN}# Description: We read ${NOFORMAT}${file_input}${GREEN} and search for name lists of multiple names (in this case: containing an ampersand &) …${NOFORMAT}"
  echo -e "${GREEN}# We would parse it with ${NOFORMAT}${bin_dwcagent}${GREEN} and …${NOFORMAT}"
  echo -e "${GREEN}# … would write all parsed names into …${NOFORMAT}"
  echo -e "${GREEN}#   ${NOFORMAT}${file_output}"
  echo -e "${GREEN}#   ${NOFORMAT}${file_output_unique}"
  echo -e "${GREEN}# We would parse names from single text lines, which is slow but overall more accurate.${NOFORMAT}"
  echo -e "${GREEN}#   ($N_parallel parallel executions of dwcagent)${NOFORMAT}"
}

Weiterführende Literatur

Muth, R. 3. August 2012: Better Bash Scripting in 15 Minutes. New York (http://robertmuth.blogspot.com/2012/08/better-bash-scripting-in-15-minutes.html, abgerufen am 24. August 2022).

Vreckem, B. V. 6. November 2020: Bash best practices. In: cheat-sheets (Cheat sheets for various stuff). (https://bertvv.github.io/cheat-sheets/Bash.html, abgerufen am 14. November 2022).

Benutzer:Andreas Plank/BASH

Inhaltsverzeichnis

Dateinamen auffinden

Sortierte Dateien vergleichen

Compare Two URL Lists

Summieren von Zahlen, Listen

BASH kurz-Optionen zu lang-Optionen ersetzen (automatisiert aus Handbuch-Dokumentation (man pages))

Redirect Errors/Standard Output

Parameter Substitution

Command substitution

Extended mode

Substrings

For Loops

Useful Commands

sort

jq ~ Get Markdown Table from Dynamic Data Values

Sed (kurz für stream editor)

Functions for BASH-Programming

Weiterführende Literatur

Navigationsmenü

Suche

Benutzer:Andreas Plank/BASH

Dateinamen auffinden

Sortierte Dateien vergleichen

Compare Two URL Lists

Summieren von Zahlen, Listen

BASH kurz-Optionen zu lang-Optionen ersetzen (automatisiert aus Handbuch-Dokumentation (man pages))

Redirect Errors/Standard Output

Parameter Substitution

Command substitution

Extended mode

Substrings

For Loops

Useful Commands

sort

jq ~ Get Markdown Table from Dynamic Data Values

Sed (kurz für stream editor)

Functions for BASH-Programming

Weiterführende Literatur

Navigationsmenü

Suche

jq ~ Get Markdown Table from Dynamic Data Values