Benutzer:Andreas Plank/BASH: Unterschied zwischen den Versionen

Neueste Überarbeitung vom 31. Oktober 2024, 12:03 Uhr

Kurze Zusammenfassungen wichtiger Empfehlungen:

Muth (Better Bash Scripting, 2012)
Vreckem (Bash best practices, 2020)

Dateinamen auffinden

?      # Genau ein beliebiges Zeichen
*      # Beliebig viele (auch 0) beliebige Zeichen
[def]  # Eines der Zeichen
[^def] # Keines der angegebenen Zeichen
[!def] # Wie oben
[a-d]  # Alle Zeichen aus dem Bereich
ls -d /[a-d]* # Verzeichnisse → /bin  /boot  /dev

Anhand von Bildeigeschaften durchsuchen (benötigt ImageMagick), z.B. suche JPG-Bilder und gib möglichst das Ursprungsdatum, das Kamera-Modell, die Kamera-Marke aus:

IFS=$'\n'; for datei in $(find . -maxdepth 4 -iname '*.jpg'); do 
  echo "Erkunde $datei"; 
  identify -verbose "${datei}" \
  | grep -i --extended-regexp '(dateTimeOriginal|exif:Model:|exif:Make:)';
done; unset IFS;

# Ausgabe zusätzlich in Datei umleiten mit Befehl `tee`
IFS=$'\n'; for datei in $(find . -maxdepth 4 -iname '*.jpg'); do 
  echo "Erkunde $datei" | tee --append "Bilder mit Kameramodellen.txt"; 
  identify -verbose "${datei}" \
  | grep -i --extended-regexp '(dateTimeOriginal|exif:Model:|exif:Make:)' \
  | tee --append "Bilder mit Kameramodellen.txt";
done; unset IFS;

Sortierte Dateien vergleichen

Inhalte beider Dateien anzeigen:

cat Datei_1.txt # sollte sortiert sein	cat Datei_2.txt # sollte sortiert sein
drei-beide eins-1 fünf -1 fünf-beide zwei-1	acht-2 drei-beide fünf-beide sechs-2 zweilerlei-2 zweitens-2

Standardmäßige Ausgabe:

comm Datei_1.txt Datei_2.txt # es werden 3 Spalten ausgegeben

        acht-2
                drei-beide
eins-1
fünf  -1
                fünf-beide
        sechs-2
zwei-1
        zweilerlei-2
        zweitens-2

Es bedeuten:

Spalte 1: Ergebnis einzig aus Datei_1.txt
Spalte 2: Ergebnis einzig aus Datei_2.txt
Spalte 3: Ergebnis aus beiden Dateien

Das Kommando comm kann nun diese drei Ausgabespalten vermittels Option unterdrücken:

comm -1 unterdrücke Ausgabespalte 1 (Ergebnis übrig: Datei_2 + beide)
comm -12 unterdrücke Ausgabespalten 1 + 2 (Ergebnis übrig: aus beiden)
comm -13 unterdrücke Ausgabespalten 1 + 3 (Ergebnis übrig: Einziges aus Datei_2)
usw.

comm -23 Datei_1.txt Datei_2.txt 
# unterdrücke Ausgabespalte 2 + 3, erübrige Spalte 1, ergibt Einziges aus Datei_1

eins-1
fünf  -1
zwei-1

comm -13 Datei_1.txt Datei_2.txt 
# unterdrücke Ausgabespalte 1 + 3, erübrige Spalte 2, ergibt Einziges aus Datei_2

acht-2
sechs-2
zweilerlei-2
zweitens-2

comm -12 Datei_1.txt Datei_2.txt 
# unterdrücke Ausgabespalte 1 + 2, erübrige Spalte 3, ergibt aus beiderlei: Datei_1 und Datei_2

drei-beide
fünf-beide

Compare Two URL Lists

Assume to have two lists of URLs, one old and one new, and you want to get only those URLs that are actually new compared to the old list. The following example asumes to have CSV (comma separated values) or TSV (tab separated values) and tries to extract the very URL, regardless of any text after the URL.

For this we use command:
comm ‹-options› oldlist_sorted comparelist_sorted or
comm ‹-options› file_1_sorted file_2_sorted and this results in 3 output columns:

column-1         column-2        column-3
only-in-file_1
                 only-in-file_2
                                 in-file_1-and-2

… so using command comm you can now suppress one or two of these three output columns using the option:

comm -1 suppress output column 1 (results left: col 2 and 3, i.e. only-of-file_2 + both of in file_1-and-2)
comm -12 suppress output columns 1 + 2 (results left: col 3, i.e. from both of in file_1-and-2)
comm -13 suppress output columns 1 + 3 (results left: col 2, i.e. results only of in file_2)
comm -23 suppress output columns 2 + 3 (results left: col 1, i.e. results only of in file_1)
aso.

# # # # # # # # # # # # # #
# Check for URI-Differences old list vs. new list (in general for CSV or TSV lists)
# ```bash
# comm file1.txt file2.txt 
# LIST1-only-of-file1 LIST2-only-of-file2 LIST3-both-in-and-of-file1-file2
#
# comm -13 donelistsorted comparelistsorted > todolistsorted
# comm -13 donelistsorted comparelistsorted > todolistsorted
donelist_source=urilist_Naturalis_20220516.csv;
donelist_sorted=${donelist_source%.*}_sorted.tsv;
donelist_sorted_noprotocol=${donelist_source%.*}_sorted_noprotocol.tsv;

comparelist_source=urilist_Naturalis_20220817.tsv;
comparelist_sorted=${comparelist_source%.*}_sorted.tsv;
comparelist_sorted_noprotocol=${comparelist_source%.*}_sorted_noprotocol.tsv;

todolist_sorted=${comparelist_source%.*}_todo.tsv;
todolist_sorted_noprotocol=${comparelist_source%.*}_todo_noprotocol.tsv;

# assume CSV (comma separated values) or TSV (tab separated values)
# assume to have URLs beginning at the line start and after it (word-boundary), any other text herein after gets ignored

# compare by removing any protocol part (http:// https:// ftp:// sftp:// aso. OR remove <…>)
# without protocol
  sed --silent --regexp-extended '/[[:alpha:]]+:\/\// { s@[[:space:]]*<?[[:alpha:]]+://([^[:space:],]+)>?\b.*$@\1@; p }'  "$donelist_source"    | sort > "$donelist_sorted_noprotocol"
  sed --silent --regexp-extended '/[[:alpha:]]+:\/\// { s@[[:space:]]*<?[[:alpha:]]+://([^[:space:],]+)>?\b.*$@\1@; p }'  "$comparelist_source" | sort > "$comparelist_sorted_noprotocol"
  # only in done-list
  comm -23 "$donelist_sorted_noprotocol" "$comparelist_sorted_noprotocol" > "$todolist_sorted_noprotocol";
  # only in compare-list
  # comm -13 "$donelist_sorted_noprotocol" "$comparelist_sorted_noprotocol" > "$todolist_sorted_noprotocol";
  grep --count "/" "$todolist_sorted_noprotocol";
# with protocol
  sed --silent --regexp-extended '/[[:alpha:]]+:\/\// { s@[[:space:]]*<?([[:alpha:]]+://[^[:space:],]+)>?\b.*$@\1@; p }'  "$donelist_source"    | sort > "$donelist_sorted"
  sed --silent --regexp-extended '/[[:alpha:]]+:\/\// { s@[[:space:]]*<?([[:alpha:]]+://[^[:space:],]+)>?\b.*$@\1@; p }'  "$comparelist_source" | sort > "$comparelist_sorted"
  # only in done-list
  comm -23 "$donelist_sorted" "$comparelist_sorted" > "$todolist_sorted";
  # only in compare-list
  # comm -13 "$donelist_sorted" "$comparelist_sorted" > "$todolist_sorted";
  grep --count "/" "$todolist_sorted";
# ```

Summieren von Zahlen, Listen

# wir wollen die Spalte der Dateigrößen zusammenrechnen 1798891, 2804087 usw.
# Abhängigkeit: awk (verarbeite Textfelder und -dateien)
# Abhängigkeit: bc (eine Rechensprache für beliebige Genauigkeit)
ls -l *importsplit* | head -n 5 
# -rw-r--r-- 1 myusername myusername 1798891 Jun 30 16:32 Thread-01_botanicalcollections.be_20220509-1108_importsplit_01.rdf._normalized.ttl.trig.gz
# -rw-r--r-- 1 myusername myusername 2804087 Jun 30 16:32 Thread-01_botanicalcollections.be_20220509-1546_importsplit_01.rdf._normalized.ttl.trig.gz
# -rw-r--r-- 1 myusername myusername  862051 Jun 30 16:32 Thread-01_botanicalcollections.be_20220509-1546_importsplit_02.rdf._normalized.ttl.trig.gz
# -rw-r--r-- 1 myusername myusername 2276286 Jun 30 16:32 Thread-01_botanicalcollections.be_20220511-1106_importsplit_01.rdf._normalized.ttl.trig.gz
# -rw-r--r-- 1 myusername myusername  692749 Jun 30 16:32 Thread-01_botanicalcollections.be_20220511-1106_importsplit_02.rdf._normalized.ttl.trig.gz
ls -l *importsplit* | awk '{ print $5 }' | paste --serial --delimiters=+ - | bc
# 362150562

BASH kurz-Optionen zu lang-Optionen ersetzen (automatisiert aus Handbuch-Dokumentation (man pages))

# aus dem Handbuch von `iptables` die Optionen …
# [!] -k, --kurz-ausgeschriebeoption als auch …
#     -k, --kurz-ausgeschriebeoption …
# … herausgreifen und ein sed-Kommando daraus machen und hübsch in Spaltendarstellung
  man iptables | grep -i --extended-regexp -- '^([[:space:]]*|[[:space:]]*[\[\]!]*[[:space:]]*)-[[:digit:][:alpha:]],[[:space:]]*--' \
     | sed --regexp-extended 's/.*(-[[:alnum:]]),[[:space:]](--[[:alnum:]]+-?[[:alnum:]]+\b).*/s@ \1 @ \2 @g; #§marker§ &/g;' \
     | sort --ignore-case | column -s '#' -t | sed 's@§marker§@#@'

# Usage sed --file=short-options2long-options4rules.v4.sed rules.v4 # read changes on the screen
# Usage sed --in-place --file=short-options2long-options4rules.v4.sed rules.v4 # replace without backup
# Usage sed --in-place=.backup_20220802 --file=short-options2long-options4rules.v4.sed   # replace with backup: rules.v4.backup_20220802
#   man iptables | grep -i --extended-regexp -- '^([[:space:]]*|[[:space:]]*[\[\]!]*[[:space:]]*)-[[:digit:][:alpha:]],[[:space:]]*--' \
#      | sed -r 's/.*(-[[:alnum:]]),[[:space:]](--[[:alnum:]]+-?[[:alnum:]]+\b).*/s@ \1 @ \2 @g; # &/g;' \
#      | sort --ignore-case
#   man iptables | grep -i --extended-regexp -- '^([[:space:]]*|[[:space:]]*[\[\]!]*[[:space:]]*)-[[:digit:][:alpha:]],[[:space:]]*--' \
#      | sed -r 's/.*(-[[:alnum:]]),[[:space:]](--[[:alnum:]]+-?[[:alnum:]]+\b).*/s@ \1 @ \2 @g; #§marker§ &/g;' \
#      | sort --ignore-case | column -s '#' -t | sed 's@§marker§@#@'
s@ -4 @ --ipv4 @g;            #        -4, --ipv4
s@ -6 @ --ipv6 @g;            #        -6, --ipv6
s@^-A @--append @g;
s@ -A @ --append @g;          #        -A, --append chain rule-specification
s@ -C @ --check @g;           #        -C, --check chain rule-specification
s@ -c @ --set-counters @g;    #        -c, --set-counters packets bytes
s@ -D @ --delete @g;          #        -D, --delete chain rulenum ... -D, --delete chain rule-specification
s@ -d @ --destination @g;     #        [!] -d, --destination address[/mask][,...]
s@ -E @ --rename-chain @g;    #        -E, --rename-chain old-chain new-chain
s@ -F @ --flush @g;           #        -F, --flush [chain]
s@ -f @ --fragment @g;        #        [!] -f, --fragment
s@ -g @ --goto @g;            #        -g, --goto chain
s@ -i @ --in-interface @g;    #        [!] -i, --in-interface name
s@ -I @ --insert @g;          #        -I, --insert chain [rulenum] rule-specification
s@ -j @ --jump @g;            #        -j, --jump target
s@ -L @ --list @g;            #        -L, --list [chain]
s@ -m @ --match @g;           #        -m, --match match
s@ -N @ --new-chain @g;       #        -N, --new-chain chain
s@ -n @ --numeric @g;         #        -n, --numeric
s@ -o @ --out-interface @g;   #        [!] -o, --out-interface name
s@ -P @ --policy @g;          #        -P, --policy chain target
s@ -p @ --protocol @g;        #        [!] -p, --protocol protocol
s@ -R @ --replace @g;         #        -R, --replace chain rulenum rule-specification
s@ -S @ --list-rules @g;      #        -S, --list-rules [chain]
s@ -s @ --source @g;          #        [!] -s, --source address[/mask][,...]
s@ -t @ --table @g;           #        -t, --table table
s@ -v @ --verbose @g;         #        -v, --verbose
s@ -V @ --version @g;         #        -V, --version
s@ -w @ --wait @g;            #        -w, --wait [seconds]
s@ -X @ --delete-chain @g;    #        -X, --delete-chain [chain]
s@ -x @ --exact @g;           #        -x, --exact
s@ -Z @ --zero @g;            #        -Z, --zero [chain [rulenum]]

Redirect Errors/Standard Output

Siehe: https://www.thomas-krenn.com/de/wiki/Bash_stdout_und_stderr_umleiten

Funktion	Bash redirection
stdout -> Datei umleiten	`programm > Datei.txt`
stderr -> Datei umleiten	`programm 2> Datei.txt`
stdout UND stderr -> Datei umleiten	`programm &> Datei.txt`
stdout -> Datei umleiten UND stderr -> Datei umleiten	`programm > Datei_stdout.txt 2> Datei_stderr.txt`
stdout -> stderr	`programm 1>&2`
stderr -> stdout	`programm 2>&1`

Parameter Substitution

echo {a,b}{1,2,3} # a1 a2 a3 b1 b2 b3
# Inhalt von Archiven vergleichen
  diff <(tar tzf Buch1.tar.gz) <(tar tzf Buch.tar.gz)

d='message'
echo $d   # → message
echo ${d} # → message
# d may be not set but a local default (no definition!)
  echo ${d-default} # → default
  echo ${d-'*'}     # → *
  echo ${d-$1}      # → output of d or the first parameter given 
# d may be not set but a default definition
  echo ${d=default}
# no d + default given, but a message and procedure is than abandoned:
  echo ${d?message}
# A shell procedure that requires some parameters to be set might start as follows:
  : ${user?} ${acct?} ${bin?}
  # will print something like: "bash: user: Parameter ist Null oder nicht gesetzt."

${string/substring/replacement} # replaces the first match
${string//substring/replacement} # replaces all matches
${string#substring} # Deletes shortest match of $substring from front of $string.
${string##substring} # Deletes longest match of $substring from front of $string.
${string%substring} # Deletes shortest match of $substring from back of $string.
${string%%substring} # Deletes longest match of $substring from back of $string.

Command substitution

# commands in `...`
echo `pwd` # → /home/myusername → is the current working directory

ls `echo "$1"`
# is the same as
ls $1

set `date`; echo $6 $2 $3, $4 # → 2010 7. Dez, 17:28:44

for i in `ls -t`; do ... # list in time order (ls -t)

Extended mode

# the shell option extglob must be activated
help shopt # print help
shopt extglob
# extglob         off
shopt -s extglob; shopt extglob
# extglob         on
#shopt -u extglob; shopt extglob
## extglob         off

?(a|b|c) # Keine oder eine der eingeschlossenen Zeichenketten
*(a|b|c) # Keine oder mehrere der eingeschlossenen Zeichenketten
+(a|b|c) # Eine oder mehrere der eingeschlossenen Zeichenketten
@(a|b|c) # Genau eine der eingeschlossenen Zeichenketten
!(a|b|c) # Alle außer den eingeschlossenen Zeichenketten

# list all directory names, beginning with "bi", "*+" or "us" 
ls -d /+(bi|*+|us)*
# /bin  /lost+found  /usr  

# list all directory names, beginning not with "b*" and the 2nd character has no "o"
ls -d /!(b*|?o*)
# /cdrom  /dev  /etc  /floppy  /lib  /mnt  /opt  /proc  /sbin  /tmp  /usr  /var

Substrings

string="0123456789stop"
echo ${string:7}      # 789stop
echo ${string:0:7}    # 0123456
echo ${string:-10000} # 0123456789stop
echo ${string: -5}    # 9stop

For Loops

#!/bin/bash
# normalerweise ist IFS=" \t\n" aber Problem in for, weil Leerzeichen falsche Trennung erzeugt
OLDIFS=$IFS
IFS=$'\n'
for datei in *.{jpg,jpeg,JPG,JPEG};do
  if [ -e "$datei" ];then
    echo "$datei (jpg > png) …";
    convert "$datei" "${datei%.*}.png" 
  fi
done
IFS=$OLDIFS

Useful Commands

# sort ls listing by domain Thread-…_wu.jacq.org_ rather than numeric by Thread-01…
file_pattern="Thread-*_gat.jacq.org*2022*-[0-9][0-9][0-9][0-9]_modified.rdf.gz"
ls $file_pattern | sed -r 's@(Thread-)[0-9]+_(.+)@& \2@' | sort -k 2 | sed -r 's@^([^[:space:]]+) .+$@\1@;'

stat --printf="# \e[32mfile name %n\e[0m (%s bytes)…" file
stat --printf="# \e[32mfile name %n\e[0m was modified %y" file
stat --format='%y' file | grep --only-matching --extended-regexp '^[[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}'

sort

Sorting of URLs but by domain regardless of which protocol there is (http, https, ftp aso.):

sort -t '/' -k3.1b urilist_JACQ_20220815_todo_sorted.tsv # sort by (t)able-field-character “/” 
  # -k*3*. 1 b  → set (k)ey field to sort after field 3, and from this position 
  # -k 3 .*1*b  → start sorting from the very 1st character to line end as being relevant for sorting
  # -k 3 . 1*b* → ignore (b)lanks
  sort --debug -t '/' -k3.1b urilist_JACQ_20220815_todo_sorted.tsv | head -n 6 # show what it is sorting actually

  # https://admont.jacq.org/ADMONT100002 
  # ____________________________________ for sort -t '/' -k1.1 --debug
  #        _____________________________ for sort -t '/' -k2.1 --debug
  #         ____________________________ for sort -t '/' -k3.1 --debug
  #                         ____________ for sort -t '/' -k4.1 --debug

sort --field-separator=$'\t' --stable +0 -4 --unique filename-tab-separated-data.tsv
  # sort uniquely from field 1 (i.e. +0), 2, … but not after field 5 (i.e. 4 (zero indexed field counting))

jq ~ Get Markdown Table from Dynamic Data Values

(see the data in the hidden box, click right)

{ "head": {
    "vars": [ "cspp_example" , "institutionID" , "publisher" , "graph" ]
  } ,
  "results": {
    "bindings": [
      { 
        "cspp_example": { "type": "uri" , "value": "http://coldb.mnhn.fr/catalognumber/mnhn/p/p00039900" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/03wkt5x30" } ,
        "publisher": { "type": "uri" , "value": "https://science.mnhn.fr/institution/mnhn/collection/p/item/search" } ,
        "graph": { "type": "uri" , "value": "http://coldb.mnhn.fr/catalognumber/mnhn/p/" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "http://data.biodiversitydata.nl/naturalis/specimen/113251" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/0566bfb96" } ,
        "graph": { "type": "uri" , "value": "http://data.biodiversitydata.nl/naturalis/specimen/" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "http://id.herb.oulu.fi/0014586" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/03yj89h83" } ,
        "publisher": { "type": "literal" , "value": "http://gbif.fi" } ,
        "graph": { "type": "uri" , "value": "http://tun.fi" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "http://id.snsb.info/snsb/collection/1000/1579/1000" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/05th1v540" } ,
        "publisher": { "type": "literal" , "value": "http://www.snsb.info" } ,
        "graph": { "type": "uri" , "value": "http://id.snsb.info/snsb/" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "http://lagu.jacq.org/object/AM-02278" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/01j60ss54" } ,
        "publisher": { "type": "literal" , "value": "LAGU" } ,
        "graph": { "type": "uri" , "value": "http://lagu.jacq.org/object" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "http://specimens.kew.org/herbarium/K000989827" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/00ynnr806" } ,
        "publisher": { "type": "uri" , "value": "https://www.kew.org" } ,
        "graph": { "type": "uri" , "value": "http://specimens.kew.org/herbarium/" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "http://tbi.jacq.org/object/TBI1014287" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/051qn8h41" } ,
        "publisher": { "type": "literal" , "value": "TBI" } ,
        "graph": { "type": "uri" , "value": "http://tbi.jacq.org/object" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "http://tun.fi/MHD.107807" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/03tcx6c30" } ,
        "publisher": { "type": "literal" , "value": "http://gbif.fi" } ,
        "graph": { "type": "uri" , "value": "http://tun.fi" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "http://tun.fi/MKA.342315" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/05vghhr25" } ,
        "publisher": { "type": "literal" , "value": "http://gbif.fi" } ,
        "graph": { "type": "uri" , "value": "http://tun.fi" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "http://tun.fi/MKA.863532" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/029pk6x14" } ,
        "publisher": { "type": "literal" , "value": "http://gbif.fi" } ,
        "graph": { "type": "uri" , "value": "http://tun.fi" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://admont.jacq.org/ADMONT100680" } ,
        "institutionID": { "type": "uri" , "value": "http://viaf.org/viaf/128466393" } ,
        "publisher": { "type": "literal" , "value": "ADMONT" } ,
        "graph": { "type": "uri" , "value": "http://admont.jacq.org" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://bak.jacq.org/BAK0-0000001" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/006m4q736" } ,
        "publisher": { "type": "literal" , "value": "BAK" } ,
        "graph": { "type": "uri" , "value": "http://bak.jacq.org" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://boz.jacq.org/BOZ000001" } ,
        "institutionID": { "type": "uri" , "value": "http://viaf.org/viaf/128699910" } ,
        "publisher": { "type": "literal" , "value": "BOZ" } ,
        "graph": { "type": "uri" , "value": "http://boz.jacq.org" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://brnu.jacq.org/BRNU000205" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/02j46qs45" } ,
        "publisher": { "type": "literal" , "value": "BRNU" } ,
        "graph": { "type": "uri" , "value": "http://brnu.jacq.org" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://data.rbge.org.uk/herb/E00000001" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/0349vqz63" } ,
        "publisher": { "type": "uri" , "value": "http://www.rbge.org.uk" } ,
        "graph": { "type": "uri" , "value": "http://data.rbge.org.uk/herb/" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://dr.jacq.org/DR000023" } ,
        "institutionID": { "type": "uri" , "value": "http://viaf.org/viaf/155418159" } ,
        "publisher": { "type": "literal" , "value": "DR" } ,
        "graph": { "type": "uri" , "value": "http://dr.jacq.org" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://ere.jacq.org/ERE0000012" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/05mpgew40" } ,
        "publisher": { "type": "literal" , "value": "ERE" } ,
        "graph": { "type": "uri" , "value": "http://ere.jacq.org" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://gat.jacq.org/GAT0000014" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/02skbsp27" } ,
        "publisher": { "type": "literal" , "value": "GAT" } ,
        "graph": { "type": "uri" , "value": "http://gat.jacq.org" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://gjo.jacq.org/GJO0000012" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/00nxtmb68" } ,
        "publisher": { "type": "literal" , "value": "GJO" } ,
        "graph": { "type": "uri" , "value": "http://gjo.jacq.org" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://gzu.jacq.org/GZU000000208" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/01faaaf77" } ,
        "publisher": { "type": "literal" , "value": "GZU" } ,
        "graph": { "type": "uri" , "value": "http://gzu.jacq.org" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://hal.jacq.org/HAL0053120" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/05gqaka33" } ,
        "publisher": { "type": "literal" , "value": "HAL" } ,
        "graph": { "type": "uri" , "value": "http://hal.jacq.org" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://herbarium.bgbm.org/object/B100000004" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/00bv4cx53" } ,
        "publisher": { "type": "literal" , "value": "BGBM" } ,
        "graph": { "type": "uri" , "value": "http://herbarium.bgbm.org/object/" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://id.smns-bw.org/smns/collection/275449/772800/279829" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/05k35b119" } ,
        "graph": { "type": "uri" , "value": "http://id.smns-bw.org/smns/collection/" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://je.jacq.org/JE00000020" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/05qpz1x62" } ,
        "publisher": { "type": "literal" , "value": "JE" } ,
        "graph": { "type": "uri" , "value": "http://je.jacq.org" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://kiel.jacq.org/KIEL0007010" } ,
        "institutionID": { "type": "uri" , "value": "http://viaf.org/viaf/239180770" } ,
        "publisher": { "type": "literal" , "value": "KIEL" } ,
        "graph": { "type": "uri" , "value": "http://kiel.jacq.org" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://lz.jacq.org/LZ161177" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/03s7gtk40" } ,
        "publisher": { "type": "literal" , "value": "LZ" } ,
        "graph": { "type": "uri" , "value": "http://lz.jacq.org" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://mjg.jacq.org/MJG000015" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/023b0x485" } ,
        "publisher": { "type": "literal" , "value": "MJG" } ,
        "graph": { "type": "uri" , "value": "http://mjg.jacq.org" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://pi.jacq.org/PI000648" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/03ad39j10" } ,
        "publisher": { "type": "literal" , "value": "PI" } ,
        "graph": { "type": "uri" , "value": "http://pi.jacq.org" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://prc.jacq.org/PRC2535" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/024d6js02" } ,
        "publisher": { "type": "literal" , "value": "PRC" } ,
        "graph": { "type": "uri" , "value": "http://prc.jacq.org" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://tub.jacq.org/TUB002830" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/03a1kwz48" } ,
        "publisher": { "type": "literal" , "value": "TUB" } ,
        "graph": { "type": "uri" , "value": "http://tub.jacq.org" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://ubt.jacq.org/UBT0010195" } ,
        "institutionID": { "type": "uri" , "value": "http://viaf.org/viaf/142509930" } ,
        "publisher": { "type": "literal" , "value": "UBT" } ,
        "graph": { "type": "uri" , "value": "http://ubt.jacq.org" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://w.jacq.org/W0000011a" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/01tv5y993" } ,
        "publisher": { "type": "literal" , "value": "W" } ,
        "graph": { "type": "uri" , "value": "http://w.jacq.org" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://wu.jacq.org/WU0000004" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/03prydq77" } ,
        "publisher": { "type": "literal" , "value": "WU" } ,
        "graph": { "type": "uri" , "value": "http://wu.jacq.org" }
      } ,
      { 
        "cspp_example": { "type": "uri" , "value": "https://www.botanicalcollections.be/specimen/BR0000005065868" } ,
        "institutionID": { "type": "uri" , "value": "https://ror.org/01h1jbk91" } ,
        "graph": { "type": "uri" , "value": "http://botanicalcollections.be/specimen/" }
      }
    ]
  }
}

These data results have also missing values, the point is to extract .head.vars and use those values in the variable $fields to query all values out of .results.bindings[].[].value, the rest is getting a nice looking:

# table head (and adding an index column)
cat institutionID_20220808.json  | jq --raw-output '.head.vars | @tsv' | sed --regexp-extended 's@^@| # | @; s@$@ |@; s@[\t]@ | @g; h; s@[^|]@-@g;x;G;'
cat institutionID_20220808.json  | jq -r '.head.vars | @tsv' | sed -r 's@^@| # | @; s@$@ |@; s@[\t]@ | @g; h; s@[^|]@-@g;x;G;'
# sed: s@^@| # | @;  add index column-header | # | to line start
# sed: s@$@ |@;      append closing table row at the end |
# sed: s@[\t]@ | @g; as it has tab separated values, replace \t by | (the colums or table data cells)
# sed: h;            put this ready formatted header into hold space buffer
# sed: s@[^|]@-@g;   replace all but “|” by “-” to make the markdown header separation
# sed: x;            exchange hold space buffer (formatted 1st table row) with the markdown header separation
# sed: G;            now we have only the first table row in place and append (G) the markdown header separation by a \n 
#                    and get a nice complete table header:
# | # | cspp_example | institutionID | publisher | graph |
# |---|--------------|---------------|-----------|-------|

# table body
  # understand table data but sort them by another column (use sort --debug to find out)
  cat institutionID_20220808.json  | jq -r '.head.vars as $fields | .results.bindings[] |  [.[($fields[])].value] |@tsv'
    # show tab separator output
  
  # we use “|” as colum separators and also to format using “|” for the column command
  cat institutionID_20220808.json  | jq --raw-output '.head.vars as $fields | .results.bindings[] |  [.[($fields[])].value] |@tsv' \
    | sed --regexp-extended 's@^@| @; s@$@ |@; s@[\t]@ | @g;' | column --table --separator '|' --output-separator '|' | sort --field-separator='|' --key=5.1b --debug 
  # short options
  cat institutionID_20220808.json  | jq -r '.head.vars as $fields | .results.bindings[] |  [.[($fields[])].value] |@tsv' \
    | sed -r 's@^@| @; s@$@ |@; s@[\t]@ | @g;' | column -t -s '|' -o '|' | sort -t '|' -k5.1b --debug 
    # sort -t '|' -k5.1b using | as table sort separator, and based on that sort the 5th field, use 1st character in the 5th field to line end,
    # -k5.1b ignore b(lanks)
    # -k5.1Vb version sort, ignore b(lanks)
    # -k5.1n natural sort (aso.)

  # table body (and adding an index column (sed -r "=;"))
  # short options
  cat institutionID_20220808.json | jq --raw-output '.head.vars as $fields | .results.bindings[] |  [.[($fields[])].value] |@tsv' \
    | sed --regexp-extended 's@^@| @; s@$@ |@; s@[\t]@ | @g;' | column --table --separator '|' --output-separator '|' | sort --field-separator='|' --key=5.1b \
    | sed "=" | sed --regexp-extended "/^[[:digit:]]/{ N; s@(^[[:digit:]]+)\n@| \1 @; }"
  cat institutionID_20220808.json | jq -r '.head.vars as $fields | .results.bindings[] |  [.[($fields[])].value] |@tsv' \
    | sed -r 's@^@| @; s@$@ |@; s@[\t]@ | @g;' | column -t -s '|' -o '|' | sort -t '|' -k5.1b \
    | sed "=" | sed -r "/^[[:digit:]]/{ N; s@(^[[:digit:]]+)\n@| \1 @; }"
  # | 1 | https://admont.jacq.org/ADMONT100680                           | http://viaf.org/viaf/128466393   | ADMONT | http://admont.jacq.org                     |
  # | 2 | https://bak.jacq.org/BAK0-0000001                              | https://ror.org/006m4q736        | BAK    | http://bak.jacq.org                        |
  # | 3 | https://www.botanicalcollections.be/specimen/BR0000005065868   | https://ror.org/01h1jbk91        |        | http://botanicalcollections.be/specimen/   |
  # | … | …                                                              | …                                | …      | …                                          |

Sed (kurz für stream editor)

Anleitungen - https://snipcademy.com/shell-scripting-sed – gute Schemadarstellungen der Kommandoabfolgen

Functions for BASH-Programming

comment_exit_code() {
    # unused
    # -------------------------------
    # Usage:
    #   comment_exit_code $exit_code
    #   comment_exit_code $exit_code "Some more exact comment what was done"
    # -------------------------------
    local this_exit_code=$1
    local this_comment=${2-}

    case $this_exit_code in [1-9]|[1-9][0-9]|[1-9][0-9][0-9])
      if [[ "${#this_comment}" -lt 1 ]];then 
      echo -e "${ORANGE}Something unexpected happened. Exit Code: ${this_exit_code} $(kill -l $this_exit_code)${NOFORMAT}" 
      else
      echo -e "${ORANGE}Something unexpected happened: ${this_comment}. Exit Code: ${this_exit_code} $(kill -l $this_exit_code)${NOFORMAT}" 
      fi
      ;;
    esac
}

repeat_text() {
    # -------------------------------
    # Usage:
    #   repeat_text n-times text
    #   repeat_text 10 '.'
    #     prints 10 dots: ..........
    #   repeat_text 10 '.' storingvariablename
    #     stores 10 dots to $storingvariablename
    # -------------------------------
    # $1=number of patterns to repeat
    # $2=pattern
    # $3=output variable name
    local tmp
    local local_1=$1
    local local_2=$2
    local local_3=${3-}
    printf -v tmp '%*s' "$local_1"
    if [[ "$local_3" ]];then
      printf -v "$local_3" '%s' "${tmp// /$local_2}"
    else
      printf '%s' "${tmp// /$local_2}"
    fi
}

setup_colors() {
  # 0 - Normal Style; 1 - Bold; 2 - Dim; 3 - Italic; 4 - Underlined; 5 - Blinking; 7 - Reverse; 8 - Invisible;
  if [[ -t 2 ]] && [[ -z "${NO_COLOR-}" ]] && [[ "${TERM-}" != "dumb" ]]; then
    NOFORMAT='\033[0m' 
    BOLD='\033[1m' ITALIC='\033[3m'
    BLUE='\033[0;34m' BLUE_BOLD='\033[1;34m' BLUE_ITALIC='\033[3;34m' 
    CYAN='\033[0;36m' CYAN_BOLD='\033[1;36m' CYAN_ITALIC='\033[3;36m' 
    GREEN='\033[0;32m' GREEN_BOLD='\033[1;32m' GREEN_ITALIC='\033[3;32m' 
    ORANGE='\033[0;33m' ORANGE_BOLD='\033[1;33m' ORANGE_ITALIC='\033[3;33m' 
    PURPLE='\033[0;35m' PURPLE_BOLD='\033[1;35m' PURPLE_ITALIC='\033[3;35m' 
    RED='\033[0;31m' RED_BOLD='\033[1;31m' RED_ITALIC='\033[3;31m' 
    YELLOW='\033[1;33m' YELLOW_BOLD='\033[1;33m' YELLOW_ITALIC='\033[3;33m'
  else
    NOFORMAT='' 
    BOLD='' ITALIC=''
    BLUE='' BLUE_BOLD='' BLUE_ITALIC='' 
    CYAN='' CYAN_BOLD='' CYAN_ITALIC='' 
    GREEN='' GREEN_BOLD='' GREEN_ITALIC='' 
    ORANGE='' ORANGE_BOLD='' ORANGE_ITALIC='' 
    PURPLE='' PURPLE_BOLD='' PURPLE_ITALIC='' 
    RED='' RED_BOLD='' RED_ITALIC='' 
    YELLOW='' YELLOW_BOLD='' YELLOW_ITALIC=''
  fi
}
setup_colors

test_dependencies() {
  local exit_level=0

  if ! [[ -e "${working_directory}/${file_input}" ]];then
    echo -e "${ORANGE}# We can not find the data in ${NOFORMAT}${working_directory}/${file_input}${ORANGE} (stop)${NOFORMAT}";
    exit_level=1;
  fi
  if ! [[ -x "$(command -v $bin_dwcagent)" ]]; then
    printf "${ORANGE}Command${NOFORMAT} $bin_dwcagent ${ORANGE} to parse names were not found. See https://libraries.io/rubygems/dwc_agent${NOFORMAT}\n"; exit_level=1;
  fi
  if ! [[ -x "$(command -v awk)" ]]; then
    printf "${ORANGE}Command${NOFORMAT} awk ${ORANGE} to read the data was not found. Please install it in your software management system.${NOFORMAT}\n"; exit_level=1;
    exit_level=1;
  fi
  case $exit_level in [1-9]) 
    printf "${ORANGE}(stop)${NOFORMAT}\n"; 
    exit 1;; 
  esac
}
test_dependencies

processinfo() {
  echo -e "${GREEN}# ---------------------------- ${NOFORMAT}"
  echo -e "${GREEN}# Description: We read ${NOFORMAT}${file_input}${GREEN} and search for name lists of multiple names (in this case: containing an ampersand &) …${NOFORMAT}"
  echo -e "${GREEN}# We would parse it with ${NOFORMAT}${bin_dwcagent}${GREEN} and …${NOFORMAT}"
  echo -e "${GREEN}# … would write all parsed names into …${NOFORMAT}"
  echo -e "${GREEN}#   ${NOFORMAT}${file_output}"
  echo -e "${GREEN}#   ${NOFORMAT}${file_output_unique}"
  echo -e "${GREEN}# We would parse names from single text lines, which is slow but overall more accurate.${NOFORMAT}"
  echo -e "${GREEN}#   ($N_parallel parallel executions of dwcagent)${NOFORMAT}"
}

Date and Time

# seconds to days hours min sec (→ https://unix.stackexchange.com/a/338844 “bash - Displaying seconds as days/hours/mins/seconds?”)
seconds="755";date --utc --date="@$seconds" +"$(( $seconds/3600/24 )) days %H hours %Mmin %Ssec"

Calculate time process

#!/bin/bash
if ! command -v datediff &> /dev/null &&  ! command -v dateutils.ddiff &> /dev/null
then
  echo -e "\e[31m# Error: Neither command datediff or dateutils.ddiff could not be found. Please install package dateutils.\e[0m"
  do_exit=1
else
  if ! command -v datediff &> /dev/null
  then
    # echo "Command dateutils.ddiff found"
    exec_datediff="dateutils.ddiff"
  elif ! command -v dateutils.ddiff &> /dev/null
    then
      # echo "Command datediff found"
      exec_datediff="datediff"
  fi
fi

datetime_start=`date --rfc-3339 'ns'` ;

echo "Sleep for 5 seconds… or some other process is going on …"; sleep 5; echo "Completed";

datetime_end=`date --rfc-3339 'ns'`;


echo $( date --date="$datetime_start" '+# Started: %Y-%m-%d %H:%M:%S%:z' )
echo $( date --date="$datetime_end"   '+# Ended:   %Y-%m-%d %H:%M:%S%:z' )
#   echo "# Started: $datetime_start" 
#   echo "# Ended:   $datetime_end"   
  
$exec_datediff "$datetime_start" "$datetime_end" -f "# Done. This took %dd  %0Hh:%0Mm:%0Ss to do something"

get_timediff_for_njobs_new () {
  # Description: calculate estimated time to finish n jobs and the estimated total time
  # ---------------------------------
  # Dependency: package dateutils
  # ---------------------------------
  # Usage:
  # get_timediff_for_njobs_new --test # to check for dependencies (datediff)
  # get_timediff_for_njobs_new begintime nowtime ntotaljobs njobsnowdone
  # get_timediff_for_njobs_new "2021-12-06 16:47:29" "2021-12-09 13:38:08" 696926 611613
  # ---------------------------------
  # echo '('`date +"%s.%N"` ' * 1000)/1' | bc # get milliseconds
  # echo '('`date +"%s.%N"` ' * 1000000)/1' | bc # get nanoseconds
  # echo $( date --rfc-3339 'ns' ) | ( read -rsd '' x; echo ${x@Q} ) # escaped
  # ---------------------------------
    
  local this_command_timediff
  
  # read if test mode to check commands
  while [[ "$#" -gt 0 ]]
  do
    case $1 in
      -t|--test)
        doexit=0
        if ! command -v datediff &> /dev/null &&  ! command -v dateutils.ddiff &> /dev/null
        then
          echo -e "# \e[31mError: Neither command datediff or dateutils.ddiff could not be found. Please install package dateutils.\e[0m"
          doexit=1
        fi
        if ! command -v sed &> /dev/null 
        then
          echo -e "# \e[31mError: command sed (stream editor) could not be found. Please install package sed.\e[0m"
          doexit=1
        fi
        if ! command -v bc &> /dev/null 
        then
          echo -e "# \e[31mError: command bc (arbitrary precision calculator) could not be found. Please install package bc.\e[0m"
          doexit=1
        fi
        if [[ $doexit -gt 1 ]];then
          exit;
        else
          return 0 # (return 0 seems success?) and exit function
        fi
      ;;
      *)
      break
      ;;
    esac
  done
  
  if ! command -v datediff &> /dev/null
  then
    # echo "Command dateutils.ddiff found"
    this_command_timediff="dateutils.ddiff"
  elif ! command -v dateutils.ddiff &> /dev/null
    then
      # echo "Command datediff found"
      this_command_timediff="datediff"
  fi

  # START estimate time to do 
  # convert also "2022-06-30_14h56m10s" to "2022-06-30 14:56:10"
  this_given_start_time=$( echo $1 | sed -r 's@([[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2})[_[:space:]-]([[:digit:]]{2})h([[:digit:]]{2})m([[:digit:]]{2})s@\1 \2:\3:4@' )
  this_given_now_time=$(   echo $2 | sed -r 's@([[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2})[_[:space:]-]([[:digit:]]{2})h([[:digit:]]{2})m([[:digit:]]{2})s@\1 \2:\3:4@' )
  
  local this_unixnanoseconds_start_timestamp=$(date --date="$this_given_start_time" '+%s.%N')
  local this_unixnanoseconds_now=$(date --date="$this_given_now_time" '+%s.%N')
  local this_unixseconds_todo=0
  local this_n_jobs_all=$(expr $3 + 0)
  local this_i_job_counter=$(expr $4 + 0)
  # echo "scale=10; 1642073008.587244684 - 1642028400.000000000" | bc -l
  local this_timediff_unixnanoseconds=`echo "scale=10; $this_unixnanoseconds_now - $this_unixnanoseconds_start_timestamp" | bc -l`
  # $(( $this_unixnanoseconds_now - $this_unixnanoseconds_start_timestamp ))
  local this_n_jobs_todo=$(( $this_n_jobs_all - $this_i_job_counter ))
  local this_msg_estimated_sofar=""

  # echo -e "\033[2m# DEBUG Test mode: all together $this_n_jobs_all ; counter $this_i_job_counter\033[0m"
  if [[ $this_n_jobs_all -eq $this_i_job_counter ]];then # done
    this_unixseconds_todo=0
    # njobs_done_so_far=`$this_command_timediff "@$this_unixnanoseconds_start_timestamp" "@$this_unixnanoseconds_now" -f "all $this_i_job_counter done, duration %dd %0Hh:%0Mm:%0Ss"`
    this_msg_estimated_sofar="nothing left to do"
  else
    # this_unixseconds_todo=$(( $this_timediff_unixnanoseconds * $this_n_jobs_todo / $this_i_job_counter ))
    # this_unixseconds_todo=$(( $this_timediff_unixnanoseconds * $this_n_jobs_todo / $this_i_job_counter ))
    this_unixseconds_todo=`echo "scale=0; $this_timediff_unixnanoseconds * $this_n_jobs_todo / $this_i_job_counter" | bc -l`
    
    job_singular_or_plural=$([ $this_n_jobs_todo -gt 1 ]  && echo jobs  || echo job )
    if [[ $this_unixseconds_todo -ge $(( 60 * 60 * 24 * 2 )) ]];then
      this_msg_estimated_sofar=`$this_command_timediff "@0" "@$this_unixseconds_todo" -f "Still $this_n_jobs_todo $job_singular_or_plural to do, estimated end %0ddays %0Hh:%0Mmin:%0Ssec"`
    elif [[ $this_unixseconds_todo -ge $(( 60 * 60 * 24 )) ]];then
      this_msg_estimated_sofar=`$this_command_timediff "@0" "@$this_unixseconds_todo" -f "Still $this_n_jobs_todo $job_singular_or_plural to do, estimated end %0dday %0Hh:%0Mmin:%0Ssec"`
    elif [[ $this_unixseconds_todo -ge $(( 60 * 60 * 1 )) ]];then
      this_msg_estimated_sofar=`$this_command_timediff "@0" "@$this_unixseconds_todo" -f "Still $this_n_jobs_todo $job_singular_or_plural to do, estimated end %0Hh:%0Mmin:%0Ssec"`
    elif [[ $this_unixseconds_todo -lt $(( 60 * 60 * 1 )) ]];then
      this_msg_estimated_sofar=`$this_command_timediff "@0" "@$this_unixseconds_todo" -f "Still $this_n_jobs_todo $job_singular_or_plural to do, estimated end %0Mmin:%0Ssec"`
    fi
  fi
  
  this_unixseconds_done=`printf "%.0f" $(echo "scale=0; $this_unixnanoseconds_now - $this_unixnanoseconds_start_timestamp" | bc -l)`
  this_unixseconds_total=`printf "%.0f" $(echo "scale=0; $this_unixseconds_done + $this_unixseconds_todo" | bc -l)`  
  if [[ $this_unixseconds_total -ge $(( 60 * 60 * 24 * 2 )) ]];then
    this_msg_time_total=`$this_command_timediff "@0" "@$this_unixseconds_total" -f "total time: %0ddays %0Hh:%0Mmin:%0Ssec"`
  elif [[ $this_unixseconds_total -ge $(( 60 * 60 * 24 )) ]];then
    this_msg_time_total=`$this_command_timediff "@0" "@$this_unixseconds_total" -f "total time: %0dday %0Hh:%0Mmin:%0Ssec"`
  elif [[ $this_unixseconds_total -ge $(( 60 * 60 * 1 )) ]];then
    this_msg_time_total=`$this_command_timediff "@0" "@$this_unixseconds_total" -f "total time: %0Hh:%0Mmin:%0Ssec"`
  elif [[ $this_unixseconds_total -lt $(( 60 * 60 * 1 )) ]];then
    this_msg_time_total=`$this_command_timediff "@0" "@$this_unixseconds_total" -f "total time: %0Mmin:%0Ssec"`
  fi
  if ! [[ $this_unixseconds_todo -eq 0 ]];then this_msg_time_total="estimated $this_msg_time_total"; fi
  
  #echo "from $this_n_jobs_all, $njobs_done_so_far; $this_msg_estimated_sofar"
  echo "${this_msg_estimated_sofar} (${this_msg_time_total})"
  # END estimate time to do 
}
export -f get_timediff_for_njobs_new # export needed otherwise /usr/bin/bash: get_timediff_for_njobs_new: command not found
get_timediff_for_njobs_new --test

Weiterführende Literatur

Muth, R. 3. August 2012: Better Bash Scripting in 15 Minutes. New York (http://robertmuth.blogspot.com/2012/08/better-bash-scripting-in-15-minutes.html, abgerufen am 24. August 2022).

Vreckem, B. V. 6. November 2020: Bash best practices. In: cheat-sheets (Cheat sheets for various stuff). (https://bertvv.github.io/cheat-sheets/Bash.html, abgerufen am 14. November 2022).

Benutzer:Andreas Plank/BASH: Unterschied zwischen den Versionen

Neueste Überarbeitung vom 31. Oktober 2024, 12:03 Uhr

Inhaltsverzeichnis

Dateinamen auffinden

Sortierte Dateien vergleichen

Compare Two URL Lists

Summieren von Zahlen, Listen

BASH kurz-Optionen zu lang-Optionen ersetzen (automatisiert aus Handbuch-Dokumentation (man pages))

Redirect Errors/Standard Output

Parameter Substitution

Command substitution

Extended mode

Substrings

For Loops

Useful Commands

sort

jq ~ Get Markdown Table from Dynamic Data Values

Sed (kurz für stream editor)

Functions for BASH-Programming

Date and Time

Weiterführende Literatur

Navigationsmenü

Suche

@@ Zeile 1: / Zeile 1: @@
+{{ZITATFORMAT Kapitälchen}}
+Kurze Zusammenfassungen wichtiger Empfehlungen:
+* {{Zitat|Muth - Better Bash Scripting - 2012|Muth (Better Bash Scripting, 2012)}}
+* {{Zitat | Vreckem - Bash best practices - 2020 |Vreckem (Bash best practices, 2020)}}
 == Dateinamen auffinden ==
@@ Zeile 9: / Zeile 14: @@
 [a-d]  # Alle Zeichen aus dem Bereich
 ls -d /[a-d]* # Verzeichnisse → /bin  /boot  /dev
+</syntaxhighlight>
+Anhand von Bildeigeschaften durchsuchen (benötigt ImageMagick), z.B. suche JPG-Bilder und gib möglichst das Ursprungsdatum, das Kamera-Modell, die Kamera-Marke aus:
+<syntaxhighlight lang="bash" style="font-size:smaller">
+IFS=$'\n'; for datei in $(find . -maxdepth 4 -iname '*.jpg'); do
+  echo "Erkunde $datei";
+  identify -verbose "${datei}" \
+  | grep -i --extended-regexp '(dateTimeOriginal|exif:Model:|exif:Make:)';
+done; unset IFS;
+# Ausgabe zusätzlich in Datei umleiten mit Befehl `tee`
+IFS=$'\n'; for datei in $(find . -maxdepth 4 -iname '*.jpg'); do
+  echo "Erkunde $datei" | tee --append "Bilder mit Kameramodellen.txt";
+  identify -verbose "${datei}" \
+  | grep -i --extended-regexp '(dateTimeOriginal|exif:Model:|exif:Make:)' \
+  | tee --append "Bilder mit Kameramodellen.txt";
+done; unset IFS;
 </syntaxhighlight>
@@ Zeile 71: / Zeile 93: @@
 <pre style="font-size:smaller;margin-left:1.5em;">drei-beide
 fünf-beide</pre>
+== Compare Two URL Lists ==
+Assume to have two lists of URLs, one old and one new, and you want to get only those URLs that are actually new compared to the old list. The following example asumes to have CSV (comma separated values) or TSV (tab separated values) and tries to extract the very URL, regardless of any text after the URL.
+<div style="margin-left:1.5em">
+For this we use command:<br/><code>comm ‹-options› oldlist_sorted comparelist_sorted</code> or<br/><code>comm ‹-options› file_1_sorted file_2_sorted</code> and this results in 3 output columns:
+ column-1         column-2        column-3
+ only-in-file_1
+                  only-in-file_2
+                                  in-file_1-and-2
+… so using command <code>comm</code> you can now suppress one or two of these three output columns using the option:
+* <code>comm -1</code> suppress output column 1 (results left: col 2 and 3, i.e. only-of-file_2 + both of in file_1-and-2)
+* <code>comm -12</code> suppress output columns 1 + 2 (results left: col 3, i.e. from both of in file_1-and-2)
+* <code>comm -13</code> suppress output columns 1 + 3 (results left: col 2, i.e. results only of in file_2)
+* <code>comm -23</code> suppress output columns 2 + 3 (results left: col 1, i.e. results only of in file_1)
+* aso.
+</div>
 <syntaxhighlight lang="bash" style="font-size:smaller;">
 # # # # # # # # # # # # # #
-# Check for URI-Differences (in general)
+# Check for URI-Differences old list vs. new list (in general for CSV or TSV lists)
 # ```bash
+# comm file1.txt file2.txt
+# LIST1-only-of-file1 LIST2-only-of-file2 LIST3-both-in-and-of-file1-file2
+#
 # comm -13 donelistsorted comparelistsorted > todolistsorted
 # comm -13 donelistsorted comparelistsorted > todolistsorted
-donelist_source=Thread-XX_id.snsb.info_20220620-1306.log;
+donelist_source=urilist_Naturalis_20220516.csv;
-donelistsorted=Thread-XX_id.snsb.info_20220620-1306_done.txt;
+donelist_sorted=${donelist_source%.*}_sorted.tsv;
+donelist_sorted_noprotocol=${donelist_source%.*}_sorted_noprotocol.tsv;
+comparelist_source=urilist_Naturalis_20220817.tsv;
+comparelist_sorted=${comparelist_source%.*}_sorted.tsv;
+comparelist_sorted_noprotocol=${comparelist_source%.*}_sorted_noprotocol.tsv;
-comparelist_source=urilist_SNSB_20220620_per_01x250000.txt;
+todolist_sorted=${comparelist_source%.*}_todo.tsv;
-comparelistsorted=urilist_SNSB_20220620_per_01x250000_sorted.txt;
+todolist_sorted_noprotocol=${comparelist_source%.*}_todo_noprotocol.tsv;
-todolistsorted=Thread-XX_id.snsb.info_20220620-1306_todo.txt;
+# assume CSV (comma separated values) or TSV (tab separated values)
+# assume to have URLs beginning at the line start and after it (word-boundary), any other text herein after gets ignored
-sed --silent -r '/http/{s@.*(https?://[^[:space:]]+)[[:space:]]Codes.+@\1@; p}'  "$donelist_source" | sort > "$donelistsorted"
+# compare by removing any protocol part (http:// https:// ftp:// sftp:// aso. OR remove <…>)
-sed --silent -r '/http/{s@.*(https?://[^[:space:]]+)[[:space:]].+@\1@; p}'  "$comparelist_source" | sort > "$comparelistsorted"
+# without protocol
-comm -13 "$donelistsorted" "$comparelistsorted" > "$todolistsorted";
+  sed --silent --regexp-extended '/[[:alpha:]]+:\/\// { s@[[:space:]]*<?[[:alpha:]]+://([^[:space:],]+)>?\b.*$@\1@; p }'  "$donelist_source"    | sort > "$donelist_sorted_noprotocol"
-   # comm -13 urilist_Naturalis_20220509_nohttps.txt urilist_Naturalis_20220511_nohttps.txt > urilist_Naturalis_20220509vs20220511.txt
+  sed --silent --regexp-extended '/[[:alpha:]]+:\/\// { s@[[:space:]]*<?[[:alpha:]]+://([^[:space:],]+)>?\b.*$@\1@; p }'  "$comparelist_source" | sort > "$comparelist_sorted_noprotocol"
-   grep --count http "$todolistsorted";
+  # only in done-list
-  # 20746
+  comm -23 "$donelist_sorted_noprotocol" "$comparelist_sorted_noprotocol" > "$todolist_sorted_noprotocol";
+  # only in compare-list
+  # comm -13 "$donelist_sorted_noprotocol" "$comparelist_sorted_noprotocol" > "$todolist_sorted_noprotocol";
+  grep --count "/" "$todolist_sorted_noprotocol";
+# with protocol
+  sed --silent --regexp-extended '/[[:alpha:]]+:\/\// { s@[[:space:]]*<?([[:alpha:]]+://[^[:space:],]+)>?\b.*$@\1@; p }'  "$donelist_source"    | sort > "$donelist_sorted"
+  sed --silent --regexp-extended '/[[:alpha:]]+:\/\// { s@[[:space:]]*<?([[:alpha:]]+://[^[:space:],]+)>?\b.*$@\1@; p }'  "$comparelist_source" | sort > "$comparelist_sorted"
+  # only in done-list
+  comm -23 "$donelist_sorted" "$comparelist_sorted" > "$todolist_sorted";
+  # only in compare-list
+   # comm -13 "$donelist_sorted" "$comparelist_sorted" > "$todolist_sorted";
+   grep --count "/" "$todolist_sorted";
 # ```
+</syntaxhighlight>
+== Summieren von Zahlen, Listen ==
+<syntaxhighlight lang="bash" style="font-size:smaller;">
+# wir wollen die Spalte der Dateigrößen zusammenrechnen 1798891, 2804087 usw.
+# Abhängigkeit: awk (verarbeite Textfelder und -dateien)
+# Abhängigkeit: bc (eine Rechensprache für beliebige Genauigkeit)
+ls -l *importsplit* | head -n 5
+# -rw-r--r-- 1 myusername myusername 1798891 Jun 30 16:32 Thread-01_botanicalcollections.be_20220509-1108_importsplit_01.rdf._normalized.ttl.trig.gz
+# -rw-r--r-- 1 myusername myusername 2804087 Jun 30 16:32 Thread-01_botanicalcollections.be_20220509-1546_importsplit_01.rdf._normalized.ttl.trig.gz
+# -rw-r--r-- 1 myusername myusername  862051 Jun 30 16:32 Thread-01_botanicalcollections.be_20220509-1546_importsplit_02.rdf._normalized.ttl.trig.gz
+# -rw-r--r-- 1 myusername myusername 2276286 Jun 30 16:32 Thread-01_botanicalcollections.be_20220511-1106_importsplit_01.rdf._normalized.ttl.trig.gz
+# -rw-r--r-- 1 myusername myusername  692749 Jun 30 16:32 Thread-01_botanicalcollections.be_20220511-1106_importsplit_02.rdf._normalized.ttl.trig.gz
+ls -l *importsplit* | awk '{ print $5 }' | paste --serial --delimiters=+ - | bc
+# 362150562
 </syntaxhighlight>
@@ Zeile 99: / Zeile 175: @@
 <syntaxhighlight lang="bash" style="font-size:smaller;">
 # aus dem Handbuch von `iptables` die Optionen …
-# [!] -k, --kurz-ausgeschriebe option als auch …
+# [!] -k, --kurz-ausgeschriebeoption als auch …
-#     -k, --kurz-ausgeschriebe option …
+#     -k, --kurz-ausgeschriebeoption …
 # … herausgreifen und ein sed-Kommando daraus machen und hübsch in Spaltendarstellung
    man iptables | grep -i --extended-regexp -- '^([[:space:]]*|[[:space:]]*[\[\]!]*[[:space:]]*)-[[:digit:][:alpha:]],[[:space:]]*--' \
-      | sed -r 's/.*(-[[:alnum:]]),[[:space:]](--[[:alnum:]]+-?[[:alnum:]]+\b).*/s@ \1 @ \2 @g; #§marker§ &/g;' \
+      | sed --regexp-extended 's/.*(-[[:alnum:]]),[[:space:]](--[[:alnum:]]+-?[[:alnum:]]+\b).*/s@ \1 @ \2 @g; #§marker§ &/g;' \
       | sort --ignore-case | column -s '#' -t | sed 's@§marker§@#@'
 </syntaxhighlight>
@@ Zeile 267: / Zeile 343: @@
 #!/bin/bash
 # normalerweise ist IFS=" \t\n" aber Problem in for, weil Leerzeichen falsche Trennung erzeugt
-OLDIF=$IFS
+OLDIFS=$IFS
 IFS=$'\n'
 for datei in *.{jpg,jpeg,JPG,JPEG};do
@@ Zeile 281: / Zeile 357: @@
-<syntaxhighlight lang="bash">
+<syntaxhighlight lang="bash" style="font-size:smaller;">
 # sort ls listing by domain Thread-…_wu.jacq.org_ rather than numeric by Thread-01…
 file_pattern="Thread-*_gat.jacq.org*2022*-[0-9][0-9][0-9][0-9]_modified.rdf.gz"
@@ Zeile 290: / Zeile 366: @@
 stat --format='%y' file | grep --only-matching --extended-regexp '^[[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}'
 </syntaxhighlight>
+=== sort ===
+Sorting of URLs but by domain regardless of which protocol there is (http, https, ftp aso.):
+<syntaxhighlight lang="bash" style="font-size:smaller;">
+sort -t '/' -k3.1b urilist_JACQ_20220815_todo_sorted.tsv # sort by (t)able-field-character “/”
+  # -k*3*. 1 b  → set (k)ey field to sort after field 3, and from this position
+  # -k 3 .*1*b  → start sorting from the very 1st character to line end as being relevant for sorting
+  # -k 3 . 1*b* → ignore (b)lanks
+  sort --debug -t '/' -k3.1b urilist_JACQ_20220815_todo_sorted.tsv | head -n 6 # show what it is sorting actually
+  # https://admont.jacq.org/ADMONT100002
+  # ____________________________________ for sort -t '/' -k1.1 --debug
+  #        _____________________________ for sort -t '/' -k2.1 --debug
+  #         ____________________________ for sort -t '/' -k3.1 --debug
+  #                         ____________ for sort -t '/' -k4.1 --debug
+sort --field-separator=$'\t' --stable +0 -4 --unique filename-tab-separated-data.tsv
+  # sort uniquely from field 1 (i.e. +0), 2, … but not after field 5 (i.e. 4 (zero indexed field counting))
+</syntaxhighlight>
+== jq ~ Get Markdown Table from Dynamic Data Values ==
+(see the data in the hidden box, click right)
+<syntaxhighlight lang="json" style="font-size:smaller;" class="mw-collapsible mw-collapsed">
+{ "head": {
+    "vars": [ "cspp_example" , "institutionID" , "publisher" , "graph" ]
+  } ,
+  "results": {
+    "bindings": [
+      {
+        "cspp_example": { "type": "uri" , "value": "http://coldb.mnhn.fr/catalognumber/mnhn/p/p00039900" } ,
+        "institutionID": { "type": "uri" , "value": "https://ror.org/03wkt5x30" } ,
+        "publisher": { "type": "uri" , "value": "https://science.mnhn.fr/institution/mnhn/collection/p/item/search" } ,
+        "graph": { "type": "uri" , "value": "http://coldb.mnhn.fr/catalognumber/mnhn/p/" }
+      } ,
+      {
+        "cspp_example": { "type": "uri" , "value": "http://data.biodiversitydata.nl/naturalis/specimen/113251" } ,
+        "institutionID": { "type": "uri" , "value": "https://ror.org/0566bfb96" } ,
+        "graph": { "type": "uri" , "value": "http://data.biodiversitydata.nl/naturalis/specimen/" }
+      } ,
+      {
+        "cspp_example": { "type": "uri" , "value": "http://id.herb.oulu.fi/0014586" } ,
+        "institutionID": { "type": "uri" , "value": "https://ror.org/03yj89h83" } ,
+        "publisher": { "type": "literal" , "value": "http://gbif.fi" } ,
+        "graph": { "type": "uri" , "value": "http://tun.fi" }
+      } ,
+      {
+        "cspp_example": { "type": "uri" , "value": "http://id.snsb.info/snsb/collection/1000/1579/1000" } ,
+        "institutionID": { "type": "uri" , "value": "https://ror.org/05th1v540" } ,
+        "publisher": { "type": "literal" , "value": "http://www.snsb.info" } ,
+        "graph": { "type": "uri" , "value": "http://id.snsb.info/snsb/" }
+      } ,
+      {
+        "cspp_example": { "type": "uri" , "value": "http://lagu.jacq.org/object/AM-02278" } ,
+        "institutionID": { "type": "uri" , "value": "https://ror.org/01j60ss54" } ,
+        "publisher": { "type": "literal" , "value": "LAGU" } ,
+        "graph": { "type": "uri" , "value": "http://lagu.jacq.org/object" }
+      } ,
+      {
+        "cspp_example": { "type": "uri" , "value": "http://specimens.kew.org/herbarium/K000989827" } ,
+        "institutionID": { "type": "uri" , "value": "https://ror.org/00ynnr806" } ,
+        "publisher": { "type": "uri" , "value": "https://www.kew.org" } ,
+        "graph": { "type": "uri" , "value": "http://specimens.kew.org/herbarium/" }
+      } ,
+      {
+        "cspp_example": { "type": "uri" , "value": "http://tbi.jacq.org/object/TBI1014287" } ,
+        "institutionID": { "type": "uri" , "value": "https://ror.org/051qn8h41" } ,
+        "publisher": { "type": "literal" , "value": "TBI" } ,
+        "graph": { "type": "uri" , "value": "http://tbi.jacq.org/object" }
+      } ,
+      {
+        "cspp_example": { "type": "uri" , "value": "http://tun.fi/MHD.107807" } ,
+        "institutionID": { "type": "uri" , "value": "https://ror.org/03tcx6c30" } ,
+        "publisher": { "type": "literal" , "value": "http://gbif.fi" } ,
+        "graph": { "type": "uri" , "value": "http://tun.fi" }
+      } ,
+      {
+        "cspp_example": { "type": "uri" , "value": "http://tun.fi/MKA.342315" } ,
+        "institutionID": { "type": "uri" , "value": "https://ror.org/05vghhr25" } ,
+        "publisher": { "type": "literal" , "value": "http://gbif.fi" } ,
+        "graph": { "type": "uri" , "value": "http://tun.fi" }
+      } ,
+      {
+        "cspp_example": { "type": "uri" , "value": "http://tun.fi/MKA.863532" } ,
+        "institutionID": { "type": "uri" , "value": "https://ror.org/029pk6x14" } ,
+        "publisher": { "type": "literal" , "value": "http://gbif.fi" } ,
+        "graph": { "type": "uri" , "value": "http://tun.fi" }
+      } ,
+      {
+        "cspp_example": { "type": "uri" , "value": "https://admont.jacq.org/ADMONT100680" } ,
+        "institutionID": { "type": "uri" , "value": "http://viaf.org/viaf/128466393" } ,
+        "publisher": { "type": "literal" , "value": "ADMONT" } ,
+        "graph": { "type": "uri" , "value": "http://admont.jacq.org" }
+      } ,
+      {
+        "cspp_example": { "type": "uri" , "value": "https://bak.jacq.org/BAK0-0000001" } ,
+        "institutionID": { "type": "uri" , "value": "https://ror.org/006m4q736" } ,
+        "publisher": { "type": "literal" , "value": "BAK" } ,
+        "graph": { "type": "uri" , "value": "http://bak.jacq.org" }
+      } ,
+      {
+        "cspp_example": { "type": "uri" , "value": "https://boz.jacq.org/BOZ000001" } ,
+        "institutionID": { "type": "uri" , "value": "http://viaf.org/viaf/128699910" } ,
+        "publisher": { "type": "literal" , "value": "BOZ" } ,
+        "graph": { "type": "uri" , "value": "http://boz.jacq.org" }
+      } ,
+      {
+        "cspp_example": { "type": "uri" , "value": "https://brnu.jacq.org/BRNU000205" } ,
+        "institutionID": { "type": "uri" , "value": "https://ror.org/02j46qs45" } ,
+        "publisher": { "type": "literal" , "value": "BRNU" } ,
+        "graph": { "type": "uri" , "value": "http://brnu.jacq.org" }
+      } ,
+      {
+        "cspp_example": { "type": "uri" , "value": "https://data.rbge.org.uk/herb/E00000001" } ,
+        "institutionID": { "type": "uri" , "value": "https://ror.org/0349vqz63" } ,
+        "publisher": { "type": "uri" , "value": "http://www.rbge.org.uk" } ,
+        "graph": { "type": "uri" , "value": "http://data.rbge.org.uk/herb/" }
+      } ,
+      {
+        "cspp_example": { "type": "uri" , "value": "https://dr.jacq.org/DR000023" } ,
+        "institutionID": { "type": "uri" , "value": "http://viaf.org/viaf/155418159" } ,
+        "publisher": { "type": "literal" , "value": "DR" } ,
+        "graph": { "type": "uri" , "value": "http://dr.jacq.org" }
+      } ,
+      {
+        "cspp_example": { "type": "uri" , "value": "https://ere.jacq.org/ERE0000012" } ,
+        "institutionID": { "type": "uri" , "value": "https://ror.org/05mpgew40" } ,
+        "publisher": { "type": "literal" , "value": "ERE" } ,
+        "graph": { "type": "uri" , "value": "http://ere.jacq.org" }
+      } ,
+      {
+        "cspp_example": { "type": "uri" , "value": "https://gat.jacq.org/GAT0000014" } ,
+        "institutionID": { "type": "uri" , "value": "https://ror.org/02skbsp27" } ,
+        "publisher": { "type": "literal" , "value": "GAT" } ,
+        "graph": { "type": "uri" , "value": "http://gat.jacq.org" }
+      } ,
+      {
+        "cspp_example": { "type": "uri" , "value": "https://gjo.jacq.org/GJO0000012" } ,
+        "institutionID": { "type": "uri" , "value": "https://ror.org/00nxtmb68" } ,
+        "publisher": { "type": "literal" , "value": "GJO" } ,
+        "graph": { "type": "uri" , "value": "http://gjo.jacq.org" }
+      } ,
+      {
+        "cspp_example": { "type": "uri" , "value": "https://gzu.jacq.org/GZU000000208" } ,
+        "institutionID": { "type": "uri" , "value": "https://ror.org/01faaaf77" } ,
+        "publisher": { "type": "literal" , "value": "GZU" } ,
+        "graph": { "type": "uri" , "value": "http://gzu.jacq.org" }
+      } ,
+      {
+        "cspp_example": { "type": "uri" , "value": "https://hal.jacq.org/HAL0053120" } ,
+        "institutionID": { "type": "uri" , "value": "https://ror.org/05gqaka33" } ,
+        "publisher": { "type": "literal" , "value": "HAL" } ,
+        "graph": { "type": "uri" , "value": "http://hal.jacq.org" }
+      } ,
+      {
+        "cspp_example": { "type": "uri" , "value": "https://herbarium.bgbm.org/object/B100000004" } ,
+        "institutionID": { "type": "uri" , "value": "https://ror.org/00bv4cx53" } ,
+        "publisher": { "type": "literal" , "value": "BGBM" } ,
+        "graph": { "type": "uri" , "value": "http://herbarium.bgbm.org/object/" }
+      } ,
+      {
+        "cspp_example": { "type": "uri" , "value": "https://id.smns-bw.org/smns/collection/275449/772800/279829" } ,
+        "institutionID": { "type": "uri" , "value": "https://ror.org/05k35b119" } ,
+        "graph": { "type": "uri" , "value": "http://id.smns-bw.org/smns/collection/" }
+      } ,
+      {
+        "cspp_example": { "type": "uri" , "value": "https://je.jacq.org/JE00000020" } ,
+        "institutionID": { "type": "uri" , "value": "https://ror.org/05qpz1x62" } ,
+        "publisher": { "type": "literal" , "value": "JE" } ,
+        "graph": { "type": "uri" , "value": "http://je.jacq.org" }
+      } ,
+      {
+        "cspp_example": { "type": "uri" , "value": "https://kiel.jacq.org/KIEL0007010" } ,
+        "institutionID": { "type": "uri" , "value": "http://viaf.org/viaf/239180770" } ,
+        "publisher": { "type": "literal" , "value": "KIEL" } ,
+        "graph": { "type": "uri" , "value": "http://kiel.jacq.org" }
+      } ,
+      {
+        "cspp_example": { "type": "uri" , "value": "https://lz.jacq.org/LZ161177" } ,
+        "institutionID": { "type": "uri" , "value": "https://ror.org/03s7gtk40" } ,
+        "publisher": { "type": "literal" , "value": "LZ" } ,
+        "graph": { "type": "uri" , "value": "http://lz.jacq.org" }
+      } ,
+      {
+        "cspp_example": { "type": "uri" , "value": "https://mjg.jacq.org/MJG000015" } ,
+        "institutionID": { "type": "uri" , "value": "https://ror.org/023b0x485" } ,
+        "publisher": { "type": "literal" , "value": "MJG" } ,
+        "graph": { "type": "uri" , "value": "http://mjg.jacq.org" }
+      } ,
+      {
+        "cspp_example": { "type": "uri" , "value": "https://pi.jacq.org/PI000648" } ,
+        "institutionID": { "type": "uri" , "value": "https://ror.org/03ad39j10" } ,
+        "publisher": { "type": "literal" , "value": "PI" } ,
+        "graph": { "type": "uri" , "value": "http://pi.jacq.org" }
+      } ,
+      {
+        "cspp_example": { "type": "uri" , "value": "https://prc.jacq.org/PRC2535" } ,
+        "institutionID": { "type": "uri" , "value": "https://ror.org/024d6js02" } ,
+        "publisher": { "type": "literal" , "value": "PRC" } ,
+        "graph": { "type": "uri" , "value": "http://prc.jacq.org" }
+      } ,
+      {
+        "cspp_example": { "type": "uri" , "value": "https://tub.jacq.org/TUB002830" } ,
+        "institutionID": { "type": "uri" , "value": "https://ror.org/03a1kwz48" } ,
+        "publisher": { "type": "literal" , "value": "TUB" } ,
+        "graph": { "type": "uri" , "value": "http://tub.jacq.org" }
+      } ,
+      {
+        "cspp_example": { "type": "uri" , "value": "https://ubt.jacq.org/UBT0010195" } ,
+        "institutionID": { "type": "uri" , "value": "http://viaf.org/viaf/142509930" } ,
+        "publisher": { "type": "literal" , "value": "UBT" } ,
+        "graph": { "type": "uri" , "value": "http://ubt.jacq.org" }
+      } ,
+      {
+        "cspp_example": { "type": "uri" , "value": "https://w.jacq.org/W0000011a" } ,
+        "institutionID": { "type": "uri" , "value": "https://ror.org/01tv5y993" } ,
+        "publisher": { "type": "literal" , "value": "W" } ,
+        "graph": { "type": "uri" , "value": "http://w.jacq.org" }
+      } ,
+      {
+        "cspp_example": { "type": "uri" , "value": "https://wu.jacq.org/WU0000004" } ,
+        "institutionID": { "type": "uri" , "value": "https://ror.org/03prydq77" } ,
+        "publisher": { "type": "literal" , "value": "WU" } ,
+        "graph": { "type": "uri" , "value": "http://wu.jacq.org" }
+      } ,
+      {
+        "cspp_example": { "type": "uri" , "value": "https://www.botanicalcollections.be/specimen/BR0000005065868" } ,
+        "institutionID": { "type": "uri" , "value": "https://ror.org/01h1jbk91" } ,
+        "graph": { "type": "uri" , "value": "http://botanicalcollections.be/specimen/" }
+      }
+    ]
+  }
+}
+</syntaxhighlight>
+These data results have also missing values, the point is to extract <code>.head.vars</code> and use those values in the variable <code>$fields</code> to query all values out of <code>.results.bindings[].[].value</code>, the rest is getting a nice looking:
+<syntaxhighlight lang="bash" style="font-size:smaller;">
+# table head (and adding an index column)
+cat institutionID_20220808.json  | jq --raw-output '.head.vars | @tsv' | sed --regexp-extended 's@^@| # | @; s@$@ |@; s@[\t]@ | @g; h; s@[^|]@-@g;x;G;'
+cat institutionID_20220808.json  | jq -r '.head.vars | @tsv' | sed -r 's@^@| # | @; s@$@ |@; s@[\t]@ | @g; h; s@[^|]@-@g;x;G;'
+# sed: s@^@| # | @;  add index column-header | # | to line start
+# sed: s@$@ |@;      append closing table row at the end |
+# sed: s@[\t]@ | @g; as it has tab separated values, replace \t by | (the colums or table data cells)
+# sed: h;            put this ready formatted header into hold space buffer
+# sed: s@[^|]@-@g;   replace all but “|” by “-” to make the markdown header separation
+# sed: x;            exchange hold space buffer (formatted 1st table row) with the markdown header separation
+# sed: G;            now we have only the first table row in place and append (G) the markdown header separation by a \n
+#                    and get a nice complete table header:
+# | # | cspp_example | institutionID | publisher | graph |
+# |---|--------------|---------------|-----------|-------|
+# table body
+  # understand table data but sort them by another column (use sort --debug to find out)
+  cat institutionID_20220808.json  | jq -r '.head.vars as $fields | .results.bindings[] |  [.[($fields[])].value] |@tsv'
+    # show tab separator output
+  # we use “|” as colum separators and also to format using “|” for the column command
+  cat institutionID_20220808.json  | jq --raw-output '.head.vars as $fields | .results.bindings[] |  [.[($fields[])].value] |@tsv' \
+    | sed --regexp-extended 's@^@| @; s@$@ |@; s@[\t]@ | @g;' | column --table --separator '|' --output-separator '|' | sort --field-separator='|' --key=5.1b --debug
+  # short options
+  cat institutionID_20220808.json  | jq -r '.head.vars as $fields | .results.bindings[] |  [.[($fields[])].value] |@tsv' \
+    | sed -r 's@^@| @; s@$@ |@; s@[\t]@ | @g;' | column -t -s '|' -o '|' | sort -t '|' -k5.1b --debug
+    # sort -t '|' -k5.1b using | as table sort separator, and based on that sort the 5th field, use 1st character in the 5th field to line end,
+    # -k5.1b ignore b(lanks)
+    # -k5.1Vb version sort, ignore b(lanks)
+    # -k5.1n natural sort (aso.)
+  # table body (and adding an index column (sed -r "=;"))
+  # short options
+  cat institutionID_20220808.json | jq --raw-output '.head.vars as $fields | .results.bindings[] |  [.[($fields[])].value] |@tsv' \
+    | sed --regexp-extended 's@^@| @; s@$@ |@; s@[\t]@ | @g;' | column --table --separator '|' --output-separator '|' | sort --field-separator='|' --key=5.1b \
+    | sed "=" | sed --regexp-extended "/^[[:digit:]]/{ N; s@(^[[:digit:]]+)\n@| \1 @; }"
+  cat institutionID_20220808.json | jq -r '.head.vars as $fields | .results.bindings[] |  [.[($fields[])].value] |@tsv' \
+    | sed -r 's@^@| @; s@$@ |@; s@[\t]@ | @g;' | column -t -s '|' -o '|' | sort -t '|' -k5.1b \
+    | sed "=" | sed -r "/^[[:digit:]]/{ N; s@(^[[:digit:]]+)\n@| \1 @; }"
+  # | 1 | https://admont.jacq.org/ADMONT100680                           | http://viaf.org/viaf/128466393   | ADMONT | http://admont.jacq.org                     |
+  # | 2 | https://bak.jacq.org/BAK0-0000001                              | https://ror.org/006m4q736        | BAK    | http://bak.jacq.org                        |
+  # | 3 | https://www.botanicalcollections.be/specimen/BR0000005065868   | https://ror.org/01h1jbk91        |        | http://botanicalcollections.be/specimen/   |
+  # | … | …                                                              | …                                | …      | …                                          |
+</syntaxhighlight>
+== Sed (kurz für ''stream editor'')<span id="sed_stream_editor"></span> ==
+Anleitungen
+- https://snipcademy.com/shell-scripting-sed – gute Schemadarstellungen der Kommandoabfolgen
+== Functions for BASH-Programming ==
+<syntaxhighlight lang="bash" style="font-size:smaller;">
+comment_exit_code() {
+    # unused
+    # -------------------------------
+    # Usage:
+    #   comment_exit_code $exit_code
+    #   comment_exit_code $exit_code "Some more exact comment what was done"
+    # -------------------------------
+    local this_exit_code=$1
+    local this_comment=${2-}
+    case $this_exit_code in [1-9]|[1-9][0-9]|[1-9][0-9][0-9])
+      if [[ "${#this_comment}" -lt 1 ]];then
+      echo -e "${ORANGE}Something unexpected happened. Exit Code: ${this_exit_code} $(kill -l $this_exit_code)${NOFORMAT}"
+      else
+      echo -e "${ORANGE}Something unexpected happened: ${this_comment}. Exit Code: ${this_exit_code} $(kill -l $this_exit_code)${NOFORMAT}"
+      fi
+      ;;
+    esac
+}
+repeat_text() {
+    # -------------------------------
+    # Usage:
+    #   repeat_text n-times text
+    #   repeat_text 10 '.'
+    #     prints 10 dots: ..........
+    #   repeat_text 10 '.' storingvariablename
+    #     stores 10 dots to $storingvariablename
+    # -------------------------------
+    # $1=number of patterns to repeat
+    # $2=pattern
+    # $3=output variable name
+    local tmp
+    local local_1=$1
+    local local_2=$2
+    local local_3=${3-}
+    printf -v tmp '%*s' "$local_1"
+    if [[ "$local_3" ]];then
+      printf -v "$local_3" '%s' "${tmp// /$local_2}"
+    else
+      printf '%s' "${tmp// /$local_2}"
+    fi
+}
+setup_colors() {
+  # 0 - Normal Style; 1 - Bold; 2 - Dim; 3 - Italic; 4 - Underlined; 5 - Blinking; 7 - Reverse; 8 - Invisible;
+  if [[ -t 2 ]] && [[ -z "${NO_COLOR-}" ]] && [[ "${TERM-}" != "dumb" ]]; then
+    NOFORMAT='\033[0m'
+    BOLD='\033[1m' ITALIC='\033[3m'
+    BLUE='\033[0;34m' BLUE_BOLD='\033[1;34m' BLUE_ITALIC='\033[3;34m'
+    CYAN='\033[0;36m' CYAN_BOLD='\033[1;36m' CYAN_ITALIC='\033[3;36m'
+    GREEN='\033[0;32m' GREEN_BOLD='\033[1;32m' GREEN_ITALIC='\033[3;32m'
+    ORANGE='\033[0;33m' ORANGE_BOLD='\033[1;33m' ORANGE_ITALIC='\033[3;33m'
+    PURPLE='\033[0;35m' PURPLE_BOLD='\033[1;35m' PURPLE_ITALIC='\033[3;35m'
+    RED='\033[0;31m' RED_BOLD='\033[1;31m' RED_ITALIC='\033[3;31m'
+    YELLOW='\033[1;33m' YELLOW_BOLD='\033[1;33m' YELLOW_ITALIC='\033[3;33m'
+  else
+    NOFORMAT=''
+    BOLD='' ITALIC=''
+    BLUE='' BLUE_BOLD='' BLUE_ITALIC=''
+    CYAN='' CYAN_BOLD='' CYAN_ITALIC=''
+    GREEN='' GREEN_BOLD='' GREEN_ITALIC=''
+    ORANGE='' ORANGE_BOLD='' ORANGE_ITALIC=''
+    PURPLE='' PURPLE_BOLD='' PURPLE_ITALIC=''
+    RED='' RED_BOLD='' RED_ITALIC=''
+    YELLOW='' YELLOW_BOLD='' YELLOW_ITALIC=''
+  fi
+}
+setup_colors
+test_dependencies() {
+  local exit_level=0
+  if ! [[ -e "${working_directory}/${file_input}" ]];then
+    echo -e "${ORANGE}# We can not find the data in ${NOFORMAT}${working_directory}/${file_input}${ORANGE} (stop)${NOFORMAT}";
+    exit_level=1;
+  fi
+  if ! [[ -x "$(command -v $bin_dwcagent)" ]]; then
+    printf "${ORANGE}Command${NOFORMAT} $bin_dwcagent ${ORANGE} to parse names were not found. See https://libraries.io/rubygems/dwc_agent${NOFORMAT}\n"; exit_level=1;
+  fi
+  if ! [[ -x "$(command -v awk)" ]]; then
+    printf "${ORANGE}Command${NOFORMAT} awk ${ORANGE} to read the data was not found. Please install it in your software management system.${NOFORMAT}\n"; exit_level=1;
+    exit_level=1;
+  fi
+  case $exit_level in [1-9])
+    printf "${ORANGE}(stop)${NOFORMAT}\n";
+    exit 1;;
+  esac
+}
+test_dependencies
+processinfo() {
+  echo -e "${GREEN}# ---------------------------- ${NOFORMAT}"
+  echo -e "${GREEN}# Description: We read ${NOFORMAT}${file_input}${GREEN} and search for name lists of multiple names (in this case: containing an ampersand &) …${NOFORMAT}"
+  echo -e "${GREEN}# We would parse it with ${NOFORMAT}${bin_dwcagent}${GREEN} and …${NOFORMAT}"
+  echo -e "${GREEN}# … would write all parsed names into …${NOFORMAT}"
+  echo -e "${GREEN}#   ${NOFORMAT}${file_output}"
+  echo -e "${GREEN}#   ${NOFORMAT}${file_output_unique}"
+  echo -e "${GREEN}# We would parse names from single text lines, which is slow but overall more accurate.${NOFORMAT}"
+  echo -e "${GREEN}#   ($N_parallel parallel executions of dwcagent)${NOFORMAT}"
+}
+</syntaxhighlight>
+== Date and Time ==
+<syntaxhighlight lang="bash" style="font-size:smaller;">
+# seconds to days hours min sec (→ https://unix.stackexchange.com/a/338844 “bash - Displaying seconds as days/hours/mins/seconds?”)
+seconds="755";date --utc --date="@$seconds" +"$(( $seconds/3600/24 )) days %H hours %Mmin %Ssec"
+</syntaxhighlight>
+Calculate time process
+<syntaxhighlight lang="bash" style="font-size:smaller;">
+#!/bin/bash
+if ! command -v datediff &> /dev/null &&  ! command -v dateutils.ddiff &> /dev/null
+then
+  echo -e "\e[31m# Error: Neither command datediff or dateutils.ddiff could not be found. Please install package dateutils.\e[0m"
+  do_exit=1
+else
+  if ! command -v datediff &> /dev/null
+  then
+    # echo "Command dateutils.ddiff found"
+    exec_datediff="dateutils.ddiff"
+  elif ! command -v dateutils.ddiff &> /dev/null
+    then
+      # echo "Command datediff found"
+      exec_datediff="datediff"
+  fi
+fi
+datetime_start=`date --rfc-3339 'ns'` ;
+echo "Sleep for 5 seconds… or some other process is going on …"; sleep 5; echo "Completed";
+datetime_end=`date --rfc-3339 'ns'`;
+echo $( date --date="$datetime_start" '+# Started: %Y-%m-%d %H:%M:%S%:z' )
+echo $( date --date="$datetime_end"   '+# Ended:   %Y-%m-%d %H:%M:%S%:z' )
+#   echo "# Started: $datetime_start"
+#   echo "# Ended:   $datetime_end"
+$exec_datediff "$datetime_start" "$datetime_end" -f "# Done. This took %dd  %0Hh:%0Mm:%0Ss to do something"
+</syntaxhighlight>
+<syntaxhighlight lang="bash" style="font-size:smaller;">
+get_timediff_for_njobs_new () {
+  # Description: calculate estimated time to finish n jobs and the estimated total time
+  # ---------------------------------
+  # Dependency: package dateutils
+  # ---------------------------------
+  # Usage:
+  # get_timediff_for_njobs_new --test # to check for dependencies (datediff)
+  # get_timediff_for_njobs_new begintime nowtime ntotaljobs njobsnowdone
+  # get_timediff_for_njobs_new "2021-12-06 16:47:29" "2021-12-09 13:38:08" 696926 611613
+  # ---------------------------------
+  # echo '('`date +"%s.%N"` ' * 1000)/1' | bc # get milliseconds
+  # echo '('`date +"%s.%N"` ' * 1000000)/1' | bc # get nanoseconds
+  # echo $( date --rfc-3339 'ns' ) | ( read -rsd '' x; echo ${x@Q} ) # escaped
+  # ---------------------------------
+  local this_command_timediff
+  # read if test mode to check commands
+  while [[ "$#" -gt 0 ]]
+  do
+    case $1 in
+      -t|--test)
+        doexit=0
+        if ! command -v datediff &> /dev/null &&  ! command -v dateutils.ddiff &> /dev/null
+        then
+          echo -e "# \e[31mError: Neither command datediff or dateutils.ddiff could not be found. Please install package dateutils.\e[0m"
+          doexit=1
+        fi
+        if ! command -v sed &> /dev/null
+        then
+          echo -e "# \e[31mError: command sed (stream editor) could not be found. Please install package sed.\e[0m"
+          doexit=1
+        fi
+        if ! command -v bc &> /dev/null
+        then
+          echo -e "# \e[31mError: command bc (arbitrary precision calculator) could not be found. Please install package bc.\e[0m"
+          doexit=1
+        fi
+        if [[ $doexit -gt 1 ]];then
+          exit;
+        else
+          return 0 # (return 0 seems success?) and exit function
+        fi
+      ;;
+      *)
+      break
+      ;;
+    esac
+  done
+  if ! command -v datediff &> /dev/null
+  then
+    # echo "Command dateutils.ddiff found"
+    this_command_timediff="dateutils.ddiff"
+  elif ! command -v dateutils.ddiff &> /dev/null
+    then
+      # echo "Command datediff found"
+      this_command_timediff="datediff"
+  fi
+  # START estimate time to do
+  # convert also "2022-06-30_14h56m10s" to "2022-06-30 14:56:10"
+  this_given_start_time=$( echo $1 | sed -r 's@([[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2})[_[:space:]-]([[:digit:]]{2})h([[:digit:]]{2})m([[:digit:]]{2})s@\1 \2:\3:4@' )
+  this_given_now_time=$(   echo $2 | sed -r 's@([[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2})[_[:space:]-]([[:digit:]]{2})h([[:digit:]]{2})m([[:digit:]]{2})s@\1 \2:\3:4@' )
+  local this_unixnanoseconds_start_timestamp=$(date --date="$this_given_start_time" '+%s.%N')
+  local this_unixnanoseconds_now=$(date --date="$this_given_now_time" '+%s.%N')
+  local this_unixseconds_todo=0
+  local this_n_jobs_all=$(expr $3 + 0)
+  local this_i_job_counter=$(expr $4 + 0)
+  # echo "scale=10; 1642073008.587244684 - 1642028400.000000000" | bc -l
+  local this_timediff_unixnanoseconds=`echo "scale=10; $this_unixnanoseconds_now - $this_unixnanoseconds_start_timestamp" | bc -l`
+  # $(( $this_unixnanoseconds_now - $this_unixnanoseconds_start_timestamp ))
+  local this_n_jobs_todo=$(( $this_n_jobs_all - $this_i_job_counter ))
+  local this_msg_estimated_sofar=""
+  # echo -e "\033[2m# DEBUG Test mode: all together $this_n_jobs_all ; counter $this_i_job_counter\033[0m"
+  if [[ $this_n_jobs_all -eq $this_i_job_counter ]];then # done
+    this_unixseconds_todo=0
+    # njobs_done_so_far=`$this_command_timediff "@$this_unixnanoseconds_start_timestamp" "@$this_unixnanoseconds_now" -f "all $this_i_job_counter done, duration %dd %0Hh:%0Mm:%0Ss"`
+    this_msg_estimated_sofar="nothing left to do"
+  else
+    # this_unixseconds_todo=$(( $this_timediff_unixnanoseconds * $this_n_jobs_todo / $this_i_job_counter ))
+    # this_unixseconds_todo=$(( $this_timediff_unixnanoseconds * $this_n_jobs_todo / $this_i_job_counter ))
+    this_unixseconds_todo=`echo "scale=0; $this_timediff_unixnanoseconds * $this_n_jobs_todo / $this_i_job_counter" | bc -l`
+    job_singular_or_plural=$([ $this_n_jobs_todo -gt 1 ]  && echo jobs  || echo job )
+    if [[ $this_unixseconds_todo -ge $(( 60 * 60 * 24 * 2 )) ]];then
+      this_msg_estimated_sofar=`$this_command_timediff "@0" "@$this_unixseconds_todo" -f "Still $this_n_jobs_todo $job_singular_or_plural to do, estimated end %0ddays %0Hh:%0Mmin:%0Ssec"`
+    elif [[ $this_unixseconds_todo -ge $(( 60 * 60 * 24 )) ]];then
+      this_msg_estimated_sofar=`$this_command_timediff "@0" "@$this_unixseconds_todo" -f "Still $this_n_jobs_todo $job_singular_or_plural to do, estimated end %0dday %0Hh:%0Mmin:%0Ssec"`
+    elif [[ $this_unixseconds_todo -ge $(( 60 * 60 * 1 )) ]];then
+      this_msg_estimated_sofar=`$this_command_timediff "@0" "@$this_unixseconds_todo" -f "Still $this_n_jobs_todo $job_singular_or_plural to do, estimated end %0Hh:%0Mmin:%0Ssec"`
+    elif [[ $this_unixseconds_todo -lt $(( 60 * 60 * 1 )) ]];then
+      this_msg_estimated_sofar=`$this_command_timediff "@0" "@$this_unixseconds_todo" -f "Still $this_n_jobs_todo $job_singular_or_plural to do, estimated end %0Mmin:%0Ssec"`
+    fi
+  fi
+  this_unixseconds_done=`printf "%.0f" $(echo "scale=0; $this_unixnanoseconds_now - $this_unixnanoseconds_start_timestamp" | bc -l)`
+  this_unixseconds_total=`printf "%.0f" $(echo "scale=0; $this_unixseconds_done + $this_unixseconds_todo" | bc -l)`
+  if [[ $this_unixseconds_total -ge $(( 60 * 60 * 24 * 2 )) ]];then
+    this_msg_time_total=`$this_command_timediff "@0" "@$this_unixseconds_total" -f "total time: %0ddays %0Hh:%0Mmin:%0Ssec"`
+  elif [[ $this_unixseconds_total -ge $(( 60 * 60 * 24 )) ]];then
+    this_msg_time_total=`$this_command_timediff "@0" "@$this_unixseconds_total" -f "total time: %0dday %0Hh:%0Mmin:%0Ssec"`
+  elif [[ $this_unixseconds_total -ge $(( 60 * 60 * 1 )) ]];then
+    this_msg_time_total=`$this_command_timediff "@0" "@$this_unixseconds_total" -f "total time: %0Hh:%0Mmin:%0Ssec"`
+  elif [[ $this_unixseconds_total -lt $(( 60 * 60 * 1 )) ]];then
+    this_msg_time_total=`$this_command_timediff "@0" "@$this_unixseconds_total" -f "total time: %0Mmin:%0Ssec"`
+  fi
+  if ! [[ $this_unixseconds_todo -eq 0 ]];then this_msg_time_total="estimated $this_msg_time_total"; fi
+  #echo "from $this_n_jobs_all, $njobs_done_so_far; $this_msg_estimated_sofar"
+  echo "${this_msg_estimated_sofar} (${this_msg_time_total})"
+  # END estimate time to do
+}
+export -f get_timediff_for_njobs_new # export needed otherwise /usr/bin/bash: get_timediff_for_njobs_new: command not found
+get_timediff_for_njobs_new --test
+</syntaxhighlight>
+{{Literaturverzeichnis}}

Benutzer:Andreas Plank/BASH: Unterschied zwischen den Versionen

Neueste Überarbeitung vom 31. Oktober 2024, 12:03 Uhr

Dateinamen auffinden

Sortierte Dateien vergleichen

Compare Two URL Lists

Summieren von Zahlen, Listen

BASH kurz-Optionen zu lang-Optionen ersetzen (automatisiert aus Handbuch-Dokumentation (man pages))

Redirect Errors/Standard Output

Parameter Substitution

Command substitution

Extended mode

Substrings

For Loops

Useful Commands

sort

jq ~ Get Markdown Table from Dynamic Data Values

Sed (kurz für stream editor)

Functions for BASH-Programming

Date and Time

Weiterführende Literatur

Navigationsmenü

Suche

jq ~ Get Markdown Table from Dynamic Data Values