Benutzer:Andreas Plank/BASH
Kurze Zusammenfassungen wichtiger Empfehlungen:
Dateinamen auffinden
? # Genau ein beliebiges Zeichen
* # Beliebig viele (auch 0) beliebige Zeichen
[def] # Eines der Zeichen
[^def] # Keines der angegebenen Zeichen
[!def] # Wie oben
[a-d] # Alle Zeichen aus dem Bereich
ls -d /[a-d]* # Verzeichnisse → /bin /boot /dev
Sortierte Dateien vergleichen
Inhalte beider Dateien anzeigen:
cat Datei_1.txt # sollte sortiert sein
|
cat Datei_2.txt # sollte sortiert sein
|
drei-beide eins-1 fünf -1 fünf-beide zwei-1 |
acht-2 drei-beide fünf-beide sechs-2 zweilerlei-2 zweitens-2 |
Standardmäßige Ausgabe:
comm Datei_1.txt Datei_2.txt # es werden 3 Spalten ausgegeben
acht-2 drei-beide eins-1 fünf -1 fünf-beide sechs-2 zwei-1 zweilerlei-2 zweitens-2
Es bedeuten:
- Spalte 1: Ergebnis einzig aus
Datei_1.txt
- Spalte 2: Ergebnis einzig aus
Datei_2.txt
- Spalte 3: Ergebnis aus beiden Dateien
Das Kommando comm
kann nun diese drei Ausgabespalten vermittels Option unterdrücken:
comm -1
unterdrücke Ausgabespalte 1 (Ergebnis übrig: Datei_2 + beide)comm -12
unterdrücke Ausgabespalten 1 + 2 (Ergebnis übrig: aus beiden)comm -13
unterdrücke Ausgabespalten 1 + 3 (Ergebnis übrig: Einziges aus Datei_2)- usw.
comm -23 Datei_1.txt Datei_2.txt
# unterdrücke Ausgabespalte 2 + 3, erübrige Spalte 1, ergibt Einziges aus Datei_1
eins-1 fünf -1 zwei-1
comm -13 Datei_1.txt Datei_2.txt
# unterdrücke Ausgabespalte 1 + 3, erübrige Spalte 2, ergibt Einziges aus Datei_2
acht-2 sechs-2 zweilerlei-2 zweitens-2
comm -12 Datei_1.txt Datei_2.txt
# unterdrücke Ausgabespalte 1 + 2, erübrige Spalte 3, ergibt aus beiderlei: Datei_1 und Datei_2
drei-beide fünf-beide
Compare Two URL Lists
Assume to have two lists of URLs, one old and one new, and you want to get only those URLs that are actually new compared to the old list. The following example asumes to have CSV (comma separated values) or TSV (tab separated values) and tries to extract the very URL, regardless of any text after the URL.
For this we use command:comm ‹-options› oldlist_sorted comparelist_sorted
orcomm ‹-options› file_1_sorted file_2_sorted
and this results in 3 output columns:
column-1 column-2 column-3 only-in-file_1 only-in-file_2 in-file_1-and-2
… so using command comm
you can now suppress one or two of these three output columns using the option:
comm -1
suppress output column 1 (results left: col 2 and 3, i.e. only-of-file_2 + both of in file_1-and-2)comm -12
suppress output columns 1 + 2 (results left: col 3, i.e. from both of in file_1-and-2)comm -13
suppress output columns 1 + 3 (results left: col 2, i.e. results only of in file_2)comm -23
suppress output columns 2 + 3 (results left: col 1, i.e. results only of in file_1)- aso.
# # # # # # # # # # # # # #
# Check for URI-Differences old list vs. new list (in general for CSV or TSV lists)
# ```bash
# comm file1.txt file2.txt
# LIST1-only-of-file1 LIST2-only-of-file2 LIST3-both-in-and-of-file1-file2
#
# comm -13 donelistsorted comparelistsorted > todolistsorted
# comm -13 donelistsorted comparelistsorted > todolistsorted
donelist_source=urilist_Naturalis_20220516.csv;
donelist_sorted=${donelist_source%.*}_sorted.tsv;
donelist_sorted_noprotocol=${donelist_source%.*}_sorted_noprotocol.tsv;
comparelist_source=urilist_Naturalis_20220817.tsv;
comparelist_sorted=${comparelist_source%.*}_sorted.tsv;
comparelist_sorted_noprotocol=${comparelist_source%.*}_sorted_noprotocol.tsv;
todolist_sorted=${comparelist_source%.*}_todo.tsv;
todolist_sorted_noprotocol=${comparelist_source%.*}_todo_noprotocol.tsv;
# assume CSV (comma separated values) or TSV (tab separated values)
# assume to have URLs beginning at the line start and after it (word-boundary), any other text herein after gets ignored
# compare by removing any protocol part (http:// https:// ftp:// sftp:// aso. OR remove <…>)
# without protocol
sed --silent --regexp-extended '/[[:alpha:]]+:\/\// { s@[[:space:]]*<?[[:alpha:]]+://([^[:space:],]+)>?\b.*$@\1@; p }' "$donelist_source" | sort > "$donelist_sorted_noprotocol"
sed --silent --regexp-extended '/[[:alpha:]]+:\/\// { s@[[:space:]]*<?[[:alpha:]]+://([^[:space:],]+)>?\b.*$@\1@; p }' "$comparelist_source" | sort > "$comparelist_sorted_noprotocol"
# only in done-list
comm -23 "$donelist_sorted_noprotocol" "$comparelist_sorted_noprotocol" > "$todolist_sorted_noprotocol";
# only in compare-list
# comm -13 "$donelist_sorted_noprotocol" "$comparelist_sorted_noprotocol" > "$todolist_sorted_noprotocol";
grep --count "/" "$todolist_sorted_noprotocol";
# with protocol
sed --silent --regexp-extended '/[[:alpha:]]+:\/\// { s@[[:space:]]*<?([[:alpha:]]+://[^[:space:],]+)>?\b.*$@\1@; p }' "$donelist_source" | sort > "$donelist_sorted"
sed --silent --regexp-extended '/[[:alpha:]]+:\/\// { s@[[:space:]]*<?([[:alpha:]]+://[^[:space:],]+)>?\b.*$@\1@; p }' "$comparelist_source" | sort > "$comparelist_sorted"
# only in done-list
comm -23 "$donelist_sorted" "$comparelist_sorted" > "$todolist_sorted";
# only in compare-list
# comm -13 "$donelist_sorted" "$comparelist_sorted" > "$todolist_sorted";
grep --count "/" "$todolist_sorted";
# ```
Summieren von Zahlen, Listen
# wir wollen die Spalte der Dateigrößen zusammenrechnen 1798891, 2804087 usw.
# Abhängigkeit: awk (verarbeite Textfelder und -dateien)
# Abhängigkeit: bc (eine Rechensprache für beliebige Genauigkeit)
ls -l *importsplit* | head -n 5
# -rw-r--r-- 1 myusername myusername 1798891 Jun 30 16:32 Thread-01_botanicalcollections.be_20220509-1108_importsplit_01.rdf._normalized.ttl.trig.gz
# -rw-r--r-- 1 myusername myusername 2804087 Jun 30 16:32 Thread-01_botanicalcollections.be_20220509-1546_importsplit_01.rdf._normalized.ttl.trig.gz
# -rw-r--r-- 1 myusername myusername 862051 Jun 30 16:32 Thread-01_botanicalcollections.be_20220509-1546_importsplit_02.rdf._normalized.ttl.trig.gz
# -rw-r--r-- 1 myusername myusername 2276286 Jun 30 16:32 Thread-01_botanicalcollections.be_20220511-1106_importsplit_01.rdf._normalized.ttl.trig.gz
# -rw-r--r-- 1 myusername myusername 692749 Jun 30 16:32 Thread-01_botanicalcollections.be_20220511-1106_importsplit_02.rdf._normalized.ttl.trig.gz
ls -l *importsplit* | awk '{ print $5 }' | paste --serial --delimiters=+ - | bc
# 362150562
BASH kurz-Optionen zu lang-Optionen ersetzen (automatisiert aus Handbuch-Dokumentation (man pages))
# aus dem Handbuch von `iptables` die Optionen …
# [!] -k, --kurz-ausgeschriebeoption als auch …
# -k, --kurz-ausgeschriebeoption …
# … herausgreifen und ein sed-Kommando daraus machen und hübsch in Spaltendarstellung
man iptables | grep -i --extended-regexp -- '^([[:space:]]*|[[:space:]]*[\[\]!]*[[:space:]]*)-[[:digit:][:alpha:]],[[:space:]]*--' \
| sed --regexp-extended 's/.*(-[[:alnum:]]),[[:space:]](--[[:alnum:]]+-?[[:alnum:]]+\b).*/s@ \1 @ \2 @g; #§marker§ &/g;' \
| sort --ignore-case | column -s '#' -t | sed 's@§marker§@#@'
# Usage sed --file=short-options2long-options4rules.v4.sed rules.v4 # read changes on the screen
# Usage sed --in-place --file=short-options2long-options4rules.v4.sed rules.v4 # replace without backup
# Usage sed --in-place=.backup_20220802 --file=short-options2long-options4rules.v4.sed # replace with backup: rules.v4.backup_20220802
# man iptables | grep -i --extended-regexp -- '^([[:space:]]*|[[:space:]]*[\[\]!]*[[:space:]]*)-[[:digit:][:alpha:]],[[:space:]]*--' \
# | sed -r 's/.*(-[[:alnum:]]),[[:space:]](--[[:alnum:]]+-?[[:alnum:]]+\b).*/s@ \1 @ \2 @g; # &/g;' \
# | sort --ignore-case
# man iptables | grep -i --extended-regexp -- '^([[:space:]]*|[[:space:]]*[\[\]!]*[[:space:]]*)-[[:digit:][:alpha:]],[[:space:]]*--' \
# | sed -r 's/.*(-[[:alnum:]]),[[:space:]](--[[:alnum:]]+-?[[:alnum:]]+\b).*/s@ \1 @ \2 @g; #§marker§ &/g;' \
# | sort --ignore-case | column -s '#' -t | sed 's@§marker§@#@'
s@ -4 @ --ipv4 @g; # -4, --ipv4
s@ -6 @ --ipv6 @g; # -6, --ipv6
s@^-A @--append @g;
s@ -A @ --append @g; # -A, --append chain rule-specification
s@ -C @ --check @g; # -C, --check chain rule-specification
s@ -c @ --set-counters @g; # -c, --set-counters packets bytes
s@ -D @ --delete @g; # -D, --delete chain rulenum ... -D, --delete chain rule-specification
s@ -d @ --destination @g; # [!] -d, --destination address[/mask][,...]
s@ -E @ --rename-chain @g; # -E, --rename-chain old-chain new-chain
s@ -F @ --flush @g; # -F, --flush [chain]
s@ -f @ --fragment @g; # [!] -f, --fragment
s@ -g @ --goto @g; # -g, --goto chain
s@ -i @ --in-interface @g; # [!] -i, --in-interface name
s@ -I @ --insert @g; # -I, --insert chain [rulenum] rule-specification
s@ -j @ --jump @g; # -j, --jump target
s@ -L @ --list @g; # -L, --list [chain]
s@ -m @ --match @g; # -m, --match match
s@ -N @ --new-chain @g; # -N, --new-chain chain
s@ -n @ --numeric @g; # -n, --numeric
s@ -o @ --out-interface @g; # [!] -o, --out-interface name
s@ -P @ --policy @g; # -P, --policy chain target
s@ -p @ --protocol @g; # [!] -p, --protocol protocol
s@ -R @ --replace @g; # -R, --replace chain rulenum rule-specification
s@ -S @ --list-rules @g; # -S, --list-rules [chain]
s@ -s @ --source @g; # [!] -s, --source address[/mask][,...]
s@ -t @ --table @g; # -t, --table table
s@ -v @ --verbose @g; # -v, --verbose
s@ -V @ --version @g; # -V, --version
s@ -w @ --wait @g; # -w, --wait [seconds]
s@ -X @ --delete-chain @g; # -X, --delete-chain [chain]
s@ -x @ --exact @g; # -x, --exact
s@ -Z @ --zero @g; # -Z, --zero [chain [rulenum]]
Redirect Errors/Standard Output
Siehe: https://www.thomas-krenn.com/de/wiki/Bash_stdout_und_stderr_umleiten
Funktion | Bash redirection |
---|---|
stdout -> Datei umleiten | programm > Datei.txt
|
stderr -> Datei umleiten | programm 2> Datei.txt
|
stdout UND stderr -> Datei umleiten | programm &> Datei.txt
|
stdout -> Datei umleiten UND stderr -> Datei umleiten | programm > Datei_stdout.txt 2> Datei_stderr.txt
|
stdout -> stderr | programm 1>&2
|
stderr -> stdout | programm 2>&1
|
Parameter Substitution
echo {a,b}{1,2,3} # a1 a2 a3 b1 b2 b3
# Inhalt von Archiven vergleichen
diff <(tar tzf Buch1.tar.gz) <(tar tzf Buch.tar.gz)
d='message'
echo $d # → message
echo ${d} # → message
# d may be not set but a local default (no definition!)
echo ${d-default} # → default
echo ${d-'*'} # → *
echo ${d-$1} # → output of d or the first parameter given
# d may be not set but a default definition
echo ${d=default}
# no d + default given, but a message and procedure is than abandoned:
echo ${d?message}
# A shell procedure that requires some parameters to be set might start as follows:
: ${user?} ${acct?} ${bin?}
# will print something like: "bash: user: Parameter ist Null oder nicht gesetzt."
${string/substring/replacement} # replaces the first match
${string//substring/replacement} # replaces all matches
${string#substring} # Deletes shortest match of $substring from front of $string.
${string##substring} # Deletes longest match of $substring from front of $string.
${string%substring} # Deletes shortest match of $substring from back of $string.
${string%%substring} # Deletes longest match of $substring from back of $string.
Command substitution
# commands in `...`
echo `pwd` # → /home/myusername → is the current working directory
ls `echo "$1"`
# is the same as
ls $1
set `date`; echo $6 $2 $3, $4 # → 2010 7. Dez, 17:28:44
for i in `ls -t`; do ... # list in time order (ls -t)
Extended mode
# the shell option extglob must be activated
help shopt # print help
shopt extglob
# extglob off
shopt -s extglob; shopt extglob
# extglob on
#shopt -u extglob; shopt extglob
## extglob off
?(a|b|c) # Keine oder eine der eingeschlossenen Zeichenketten
*(a|b|c) # Keine oder mehrere der eingeschlossenen Zeichenketten
+(a|b|c) # Eine oder mehrere der eingeschlossenen Zeichenketten
@(a|b|c) # Genau eine der eingeschlossenen Zeichenketten
!(a|b|c) # Alle außer den eingeschlossenen Zeichenketten
# list all directory names, beginning with "bi", "*+" or "us"
ls -d /+(bi|*+|us)*
# /bin /lost+found /usr
# list all directory names, beginning not with "b*" and the 2nd character has no "o"
ls -d /!(b*|?o*)
# /cdrom /dev /etc /floppy /lib /mnt /opt /proc /sbin /tmp /usr /var
Substrings
string="0123456789stop"
echo ${string:7} # 789stop
echo ${string:0:7} # 0123456
echo ${string:-10000} # 0123456789stop
echo ${string: -5} # 9stop
For Loops
#!/bin/bash
# normalerweise ist IFS=" \t\n" aber Problem in for, weil Leerzeichen falsche Trennung erzeugt
OLDIFS=$IFS
IFS=$'\n'
for datei in *.{jpg,jpeg,JPG,JPEG};do
if [ -e "$datei" ];then
echo "$datei (jpg > png) …";
convert "$datei" "${datei%.*}.png"
fi
done
IFS=$OLDIFS
Useful Commands
# sort ls listing by domain Thread-…_wu.jacq.org_ rather than numeric by Thread-01…
file_pattern="Thread-*_gat.jacq.org*2022*-[0-9][0-9][0-9][0-9]_modified.rdf.gz"
ls $file_pattern | sed -r 's@(Thread-)[0-9]+_(.+)@& \2@' | sort -k 2 | sed -r 's@^([^[:space:]]+) .+$@\1@;'
stat --printf="# \e[32mfile name %n\e[0m (%s bytes)…" file
stat --printf="# \e[32mfile name %n\e[0m was modified %y" file
stat --format='%y' file | grep --only-matching --extended-regexp '^[[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}'
sort
Sorting of URLs but by domain regardless of which protocol there is (http, https, ftp aso.):
sort -t '/' -k3.1b urilist_JACQ_20220815_todo_sorted.tsv # sort by (t)able-field-character “/”
# -k*3*. 1 b → set (k)ey field to sort after field 3, and from this position
# -k 3 .*1*b → start sorting from the very 1st character to line end as being relevant for sorting
# -k 3 . 1*b* → ignore (b)lanks
sort --debug -t '/' -k3.1b urilist_JACQ_20220815_todo_sorted.tsv | head -n 6 # show what it is sorting actually
# https://admont.jacq.org/ADMONT100002
# ____________________________________ for sort -t '/' -k1.1 --debug
# _____________________________ for sort -t '/' -k2.1 --debug
# ____________________________ for sort -t '/' -k3.1 --debug
# ____________ for sort -t '/' -k4.1 --debug
sort --field-separator=$'\t' --stable +0 -4 --unique filename-tab-separated-data.tsv
# sort uniquely from field 1 (i.e. +0), 2, … but not after field 5 (i.e. 4 (zero indexed field counting))
jq ~ Get Markdown Table from Dynamic Data Values
(see the data in the hidden box, click right)
{ "head": {
"vars": [ "cspp_example" , "institutionID" , "publisher" , "graph" ]
} ,
"results": {
"bindings": [
{
"cspp_example": { "type": "uri" , "value": "http://coldb.mnhn.fr/catalognumber/mnhn/p/p00039900" } ,
"institutionID": { "type": "uri" , "value": "https://ror.org/03wkt5x30" } ,
"publisher": { "type": "uri" , "value": "https://science.mnhn.fr/institution/mnhn/collection/p/item/search" } ,
"graph": { "type": "uri" , "value": "http://coldb.mnhn.fr/catalognumber/mnhn/p/" }
} ,
{
"cspp_example": { "type": "uri" , "value": "http://data.biodiversitydata.nl/naturalis/specimen/113251" } ,
"institutionID": { "type": "uri" , "value": "https://ror.org/0566bfb96" } ,
"graph": { "type": "uri" , "value": "http://data.biodiversitydata.nl/naturalis/specimen/" }
} ,
{
"cspp_example": { "type": "uri" , "value": "http://id.herb.oulu.fi/0014586" } ,
"institutionID": { "type": "uri" , "value": "https://ror.org/03yj89h83" } ,
"publisher": { "type": "literal" , "value": "http://gbif.fi" } ,
"graph": { "type": "uri" , "value": "http://tun.fi" }
} ,
{
"cspp_example": { "type": "uri" , "value": "http://id.snsb.info/snsb/collection/1000/1579/1000" } ,
"institutionID": { "type": "uri" , "value": "https://ror.org/05th1v540" } ,
"publisher": { "type": "literal" , "value": "http://www.snsb.info" } ,
"graph": { "type": "uri" , "value": "http://id.snsb.info/snsb/" }
} ,
{
"cspp_example": { "type": "uri" , "value": "http://lagu.jacq.org/object/AM-02278" } ,
"institutionID": { "type": "uri" , "value": "https://ror.org/01j60ss54" } ,
"publisher": { "type": "literal" , "value": "LAGU" } ,
"graph": { "type": "uri" , "value": "http://lagu.jacq.org/object" }
} ,
{
"cspp_example": { "type": "uri" , "value": "http://specimens.kew.org/herbarium/K000989827" } ,
"institutionID": { "type": "uri" , "value": "https://ror.org/00ynnr806" } ,
"publisher": { "type": "uri" , "value": "https://www.kew.org" } ,
"graph": { "type": "uri" , "value": "http://specimens.kew.org/herbarium/" }
} ,
{
"cspp_example": { "type": "uri" , "value": "http://tbi.jacq.org/object/TBI1014287" } ,
"institutionID": { "type": "uri" , "value": "https://ror.org/051qn8h41" } ,
"publisher": { "type": "literal" , "value": "TBI" } ,
"graph": { "type": "uri" , "value": "http://tbi.jacq.org/object" }
} ,
{
"cspp_example": { "type": "uri" , "value": "http://tun.fi/MHD.107807" } ,
"institutionID": { "type": "uri" , "value": "https://ror.org/03tcx6c30" } ,
"publisher": { "type": "literal" , "value": "http://gbif.fi" } ,
"graph": { "type": "uri" , "value": "http://tun.fi" }
} ,
{
"cspp_example": { "type": "uri" , "value": "http://tun.fi/MKA.342315" } ,
"institutionID": { "type": "uri" , "value": "https://ror.org/05vghhr25" } ,
"publisher": { "type": "literal" , "value": "http://gbif.fi" } ,
"graph": { "type": "uri" , "value": "http://tun.fi" }
} ,
{
"cspp_example": { "type": "uri" , "value": "http://tun.fi/MKA.863532" } ,
"institutionID": { "type": "uri" , "value": "https://ror.org/029pk6x14" } ,
"publisher": { "type": "literal" , "value": "http://gbif.fi" } ,
"graph": { "type": "uri" , "value": "http://tun.fi" }
} ,
{
"cspp_example": { "type": "uri" , "value": "https://admont.jacq.org/ADMONT100680" } ,
"institutionID": { "type": "uri" , "value": "http://viaf.org/viaf/128466393" } ,
"publisher": { "type": "literal" , "value": "ADMONT" } ,
"graph": { "type": "uri" , "value": "http://admont.jacq.org" }
} ,
{
"cspp_example": { "type": "uri" , "value": "https://bak.jacq.org/BAK0-0000001" } ,
"institutionID": { "type": "uri" , "value": "https://ror.org/006m4q736" } ,
"publisher": { "type": "literal" , "value": "BAK" } ,
"graph": { "type": "uri" , "value": "http://bak.jacq.org" }
} ,
{
"cspp_example": { "type": "uri" , "value": "https://boz.jacq.org/BOZ000001" } ,
"institutionID": { "type": "uri" , "value": "http://viaf.org/viaf/128699910" } ,
"publisher": { "type": "literal" , "value": "BOZ" } ,
"graph": { "type": "uri" , "value": "http://boz.jacq.org" }
} ,
{
"cspp_example": { "type": "uri" , "value": "https://brnu.jacq.org/BRNU000205" } ,
"institutionID": { "type": "uri" , "value": "https://ror.org/02j46qs45" } ,
"publisher": { "type": "literal" , "value": "BRNU" } ,
"graph": { "type": "uri" , "value": "http://brnu.jacq.org" }
} ,
{
"cspp_example": { "type": "uri" , "value": "https://data.rbge.org.uk/herb/E00000001" } ,
"institutionID": { "type": "uri" , "value": "https://ror.org/0349vqz63" } ,
"publisher": { "type": "uri" , "value": "http://www.rbge.org.uk" } ,
"graph": { "type": "uri" , "value": "http://data.rbge.org.uk/herb/" }
} ,
{
"cspp_example": { "type": "uri" , "value": "https://dr.jacq.org/DR000023" } ,
"institutionID": { "type": "uri" , "value": "http://viaf.org/viaf/155418159" } ,
"publisher": { "type": "literal" , "value": "DR" } ,
"graph": { "type": "uri" , "value": "http://dr.jacq.org" }
} ,
{
"cspp_example": { "type": "uri" , "value": "https://ere.jacq.org/ERE0000012" } ,
"institutionID": { "type": "uri" , "value": "https://ror.org/05mpgew40" } ,
"publisher": { "type": "literal" , "value": "ERE" } ,
"graph": { "type": "uri" , "value": "http://ere.jacq.org" }
} ,
{
"cspp_example": { "type": "uri" , "value": "https://gat.jacq.org/GAT0000014" } ,
"institutionID": { "type": "uri" , "value": "https://ror.org/02skbsp27" } ,
"publisher": { "type": "literal" , "value": "GAT" } ,
"graph": { "type": "uri" , "value": "http://gat.jacq.org" }
} ,
{
"cspp_example": { "type": "uri" , "value": "https://gjo.jacq.org/GJO0000012" } ,
"institutionID": { "type": "uri" , "value": "https://ror.org/00nxtmb68" } ,
"publisher": { "type": "literal" , "value": "GJO" } ,
"graph": { "type": "uri" , "value": "http://gjo.jacq.org" }
} ,
{
"cspp_example": { "type": "uri" , "value": "https://gzu.jacq.org/GZU000000208" } ,
"institutionID": { "type": "uri" , "value": "https://ror.org/01faaaf77" } ,
"publisher": { "type": "literal" , "value": "GZU" } ,
"graph": { "type": "uri" , "value": "http://gzu.jacq.org" }
} ,
{
"cspp_example": { "type": "uri" , "value": "https://hal.jacq.org/HAL0053120" } ,
"institutionID": { "type": "uri" , "value": "https://ror.org/05gqaka33" } ,
"publisher": { "type": "literal" , "value": "HAL" } ,
"graph": { "type": "uri" , "value": "http://hal.jacq.org" }
} ,
{
"cspp_example": { "type": "uri" , "value": "https://herbarium.bgbm.org/object/B100000004" } ,
"institutionID": { "type": "uri" , "value": "https://ror.org/00bv4cx53" } ,
"publisher": { "type": "literal" , "value": "BGBM" } ,
"graph": { "type": "uri" , "value": "http://herbarium.bgbm.org/object/" }
} ,
{
"cspp_example": { "type": "uri" , "value": "https://id.smns-bw.org/smns/collection/275449/772800/279829" } ,
"institutionID": { "type": "uri" , "value": "https://ror.org/05k35b119" } ,
"graph": { "type": "uri" , "value": "http://id.smns-bw.org/smns/collection/" }
} ,
{
"cspp_example": { "type": "uri" , "value": "https://je.jacq.org/JE00000020" } ,
"institutionID": { "type": "uri" , "value": "https://ror.org/05qpz1x62" } ,
"publisher": { "type": "literal" , "value": "JE" } ,
"graph": { "type": "uri" , "value": "http://je.jacq.org" }
} ,
{
"cspp_example": { "type": "uri" , "value": "https://kiel.jacq.org/KIEL0007010" } ,
"institutionID": { "type": "uri" , "value": "http://viaf.org/viaf/239180770" } ,
"publisher": { "type": "literal" , "value": "KIEL" } ,
"graph": { "type": "uri" , "value": "http://kiel.jacq.org" }
} ,
{
"cspp_example": { "type": "uri" , "value": "https://lz.jacq.org/LZ161177" } ,
"institutionID": { "type": "uri" , "value": "https://ror.org/03s7gtk40" } ,
"publisher": { "type": "literal" , "value": "LZ" } ,
"graph": { "type": "uri" , "value": "http://lz.jacq.org" }
} ,
{
"cspp_example": { "type": "uri" , "value": "https://mjg.jacq.org/MJG000015" } ,
"institutionID": { "type": "uri" , "value": "https://ror.org/023b0x485" } ,
"publisher": { "type": "literal" , "value": "MJG" } ,
"graph": { "type": "uri" , "value": "http://mjg.jacq.org" }
} ,
{
"cspp_example": { "type": "uri" , "value": "https://pi.jacq.org/PI000648" } ,
"institutionID": { "type": "uri" , "value": "https://ror.org/03ad39j10" } ,
"publisher": { "type": "literal" , "value": "PI" } ,
"graph": { "type": "uri" , "value": "http://pi.jacq.org" }
} ,
{
"cspp_example": { "type": "uri" , "value": "https://prc.jacq.org/PRC2535" } ,
"institutionID": { "type": "uri" , "value": "https://ror.org/024d6js02" } ,
"publisher": { "type": "literal" , "value": "PRC" } ,
"graph": { "type": "uri" , "value": "http://prc.jacq.org" }
} ,
{
"cspp_example": { "type": "uri" , "value": "https://tub.jacq.org/TUB002830" } ,
"institutionID": { "type": "uri" , "value": "https://ror.org/03a1kwz48" } ,
"publisher": { "type": "literal" , "value": "TUB" } ,
"graph": { "type": "uri" , "value": "http://tub.jacq.org" }
} ,
{
"cspp_example": { "type": "uri" , "value": "https://ubt.jacq.org/UBT0010195" } ,
"institutionID": { "type": "uri" , "value": "http://viaf.org/viaf/142509930" } ,
"publisher": { "type": "literal" , "value": "UBT" } ,
"graph": { "type": "uri" , "value": "http://ubt.jacq.org" }
} ,
{
"cspp_example": { "type": "uri" , "value": "https://w.jacq.org/W0000011a" } ,
"institutionID": { "type": "uri" , "value": "https://ror.org/01tv5y993" } ,
"publisher": { "type": "literal" , "value": "W" } ,
"graph": { "type": "uri" , "value": "http://w.jacq.org" }
} ,
{
"cspp_example": { "type": "uri" , "value": "https://wu.jacq.org/WU0000004" } ,
"institutionID": { "type": "uri" , "value": "https://ror.org/03prydq77" } ,
"publisher": { "type": "literal" , "value": "WU" } ,
"graph": { "type": "uri" , "value": "http://wu.jacq.org" }
} ,
{
"cspp_example": { "type": "uri" , "value": "https://www.botanicalcollections.be/specimen/BR0000005065868" } ,
"institutionID": { "type": "uri" , "value": "https://ror.org/01h1jbk91" } ,
"graph": { "type": "uri" , "value": "http://botanicalcollections.be/specimen/" }
}
]
}
}
These data results have also missing values, the point is to extract .head.vars
and use those values in the variable $fields
to query all values out of .results.bindings[].[].value
, the rest is getting a nice looking:
# table head (and adding an index column)
cat institutionID_20220808.json | jq --raw-output '.head.vars | @tsv' | sed --regexp-extended 's@^@| # | @; s@$@ |@; s@[\t]@ | @g; h; s@[^|]@-@g;x;G;'
cat institutionID_20220808.json | jq -r '.head.vars | @tsv' | sed -r 's@^@| # | @; s@$@ |@; s@[\t]@ | @g; h; s@[^|]@-@g;x;G;'
# sed: s@^@| # | @; add index column-header | # | to line start
# sed: s@$@ |@; append closing table row at the end |
# sed: s@[\t]@ | @g; as it has tab separated values, replace \t by | (the colums or table data cells)
# sed: h; put this ready formatted header into hold space buffer
# sed: s@[^|]@-@g; replace all but “|” by “-” to make the markdown header separation
# sed: x; exchange hold space buffer (formatted 1st table row) with the markdown header separation
# sed: G; now we have only the first table row in place and append (G) the markdown header separation by a \n
# and get a nice complete table header:
# | # | cspp_example | institutionID | publisher | graph |
# |---|--------------|---------------|-----------|-------|
# table body
# understand table data but sort them by another column (use sort --debug to find out)
cat institutionID_20220808.json | jq -r '.head.vars as $fields | .results.bindings[] | [.[($fields[])].value] |@tsv'
# show tab separator output
# we use “|” as colum separators and also to format using “|” for the column command
cat institutionID_20220808.json | jq --raw-output '.head.vars as $fields | .results.bindings[] | [.[($fields[])].value] |@tsv' \
| sed --regexp-extended 's@^@| @; s@$@ |@; s@[\t]@ | @g;' | column --table --separator '|' --output-separator '|' | sort --field-separator='|' --key=5.1b --debug
# short options
cat institutionID_20220808.json | jq -r '.head.vars as $fields | .results.bindings[] | [.[($fields[])].value] |@tsv' \
| sed -r 's@^@| @; s@$@ |@; s@[\t]@ | @g;' | column -t -s '|' -o '|' | sort -t '|' -k5.1b --debug
# sort -t '|' -k5.1b using | as table sort separator, and based on that sort the 5th field, use 1st character in the 5th field to line end,
# -k5.1b ignore b(lanks)
# -k5.1Vb version sort, ignore b(lanks)
# -k5.1n natural sort (aso.)
# table body (and adding an index column (sed -r "=;"))
# short options
cat institutionID_20220808.json | jq --raw-output '.head.vars as $fields | .results.bindings[] | [.[($fields[])].value] |@tsv' \
| sed --regexp-extended 's@^@| @; s@$@ |@; s@[\t]@ | @g;' | column --table --separator '|' --output-separator '|' | sort --field-separator='|' --key=5.1b \
| sed "=" | sed --regexp-extended "/^[[:digit:]]/{ N; s@(^[[:digit:]]+)\n@| \1 @; }"
cat institutionID_20220808.json | jq -r '.head.vars as $fields | .results.bindings[] | [.[($fields[])].value] |@tsv' \
| sed -r 's@^@| @; s@$@ |@; s@[\t]@ | @g;' | column -t -s '|' -o '|' | sort -t '|' -k5.1b \
| sed "=" | sed -r "/^[[:digit:]]/{ N; s@(^[[:digit:]]+)\n@| \1 @; }"
# | 1 | https://admont.jacq.org/ADMONT100680 | http://viaf.org/viaf/128466393 | ADMONT | http://admont.jacq.org |
# | 2 | https://bak.jacq.org/BAK0-0000001 | https://ror.org/006m4q736 | BAK | http://bak.jacq.org |
# | 3 | https://www.botanicalcollections.be/specimen/BR0000005065868 | https://ror.org/01h1jbk91 | | http://botanicalcollections.be/specimen/ |
# | … | … | … | … | … |
Sed (kurz für stream editor)
Anleitungen - https://snipcademy.com/shell-scripting-sed – gute Schemadarstellungen der Kommandoabfolgen
Functions for BASH-Programming
comment_exit_code() {
# unused
# -------------------------------
# Usage:
# comment_exit_code $exit_code
# comment_exit_code $exit_code "Some more exact comment what was done"
# -------------------------------
local this_exit_code=$1
local this_comment=${2-}
case $this_exit_code in [1-9]|[1-9][0-9]|[1-9][0-9][0-9])
if [[ "${#this_comment}" -lt 1 ]];then
echo -e "${ORANGE}Something unexpected happened. Exit Code: ${this_exit_code} $(kill -l $this_exit_code)${NOFORMAT}"
else
echo -e "${ORANGE}Something unexpected happened: ${this_comment}. Exit Code: ${this_exit_code} $(kill -l $this_exit_code)${NOFORMAT}"
fi
;;
esac
}
repeat_text() {
# -------------------------------
# Usage:
# repeat_text n-times text
# repeat_text 10 '.'
# prints 10 dots: ..........
# repeat_text 10 '.' storingvariablename
# stores 10 dots to $storingvariablename
# -------------------------------
# $1=number of patterns to repeat
# $2=pattern
# $3=output variable name
local tmp
local local_1=$1
local local_2=$2
local local_3=${3-}
printf -v tmp '%*s' "$local_1"
if [[ "$local_3" ]];then
printf -v "$local_3" '%s' "${tmp// /$local_2}"
else
printf '%s' "${tmp// /$local_2}"
fi
}
setup_colors() {
# 0 - Normal Style; 1 - Bold; 2 - Dim; 3 - Italic; 4 - Underlined; 5 - Blinking; 7 - Reverse; 8 - Invisible;
if [[ -t 2 ]] && [[ -z "${NO_COLOR-}" ]] && [[ "${TERM-}" != "dumb" ]]; then
NOFORMAT='\033[0m'
BOLD='\033[1m' ITALIC='\033[3m'
BLUE='\033[0;34m' BLUE_BOLD='\033[1;34m' BLUE_ITALIC='\033[3;34m'
CYAN='\033[0;36m' CYAN_BOLD='\033[1;36m' CYAN_ITALIC='\033[3;36m'
GREEN='\033[0;32m' GREEN_BOLD='\033[1;32m' GREEN_ITALIC='\033[3;32m'
ORANGE='\033[0;33m' ORANGE_BOLD='\033[1;33m' ORANGE_ITALIC='\033[3;33m'
PURPLE='\033[0;35m' PURPLE_BOLD='\033[1;35m' PURPLE_ITALIC='\033[3;35m'
RED='\033[0;31m' RED_BOLD='\033[1;31m' RED_ITALIC='\033[3;31m'
YELLOW='\033[1;33m' YELLOW_BOLD='\033[1;33m' YELLOW_ITALIC='\033[3;33m'
else
NOFORMAT=''
BOLD='' ITALIC=''
BLUE='' BLUE_BOLD='' BLUE_ITALIC=''
CYAN='' CYAN_BOLD='' CYAN_ITALIC=''
GREEN='' GREEN_BOLD='' GREEN_ITALIC=''
ORANGE='' ORANGE_BOLD='' ORANGE_ITALIC=''
PURPLE='' PURPLE_BOLD='' PURPLE_ITALIC=''
RED='' RED_BOLD='' RED_ITALIC=''
YELLOW='' YELLOW_BOLD='' YELLOW_ITALIC=''
fi
}
setup_colors
test_dependencies() {
local exit_level=0
if ! [[ -e "${working_directory}/${file_input}" ]];then
echo -e "${ORANGE}# We can not find the data in ${NOFORMAT}${working_directory}/${file_input}${ORANGE} (stop)${NOFORMAT}";
exit_level=1;
fi
if ! [[ -x "$(command -v $bin_dwcagent)" ]]; then
printf "${ORANGE}Command${NOFORMAT} $bin_dwcagent ${ORANGE} to parse names were not found. See https://libraries.io/rubygems/dwc_agent${NOFORMAT}\n"; exit_level=1;
fi
if ! [[ -x "$(command -v awk)" ]]; then
printf "${ORANGE}Command${NOFORMAT} awk ${ORANGE} to read the data was not found. Please install it in your software management system.${NOFORMAT}\n"; exit_level=1;
exit_level=1;
fi
case $exit_level in [1-9])
printf "${ORANGE}(stop)${NOFORMAT}\n";
exit 1;;
esac
}
test_dependencies
processinfo() {
echo -e "${GREEN}# ---------------------------- ${NOFORMAT}"
echo -e "${GREEN}# Description: We read ${NOFORMAT}${file_input}${GREEN} and search for name lists of multiple names (in this case: containing an ampersand &) …${NOFORMAT}"
echo -e "${GREEN}# We would parse it with ${NOFORMAT}${bin_dwcagent}${GREEN} and …${NOFORMAT}"
echo -e "${GREEN}# … would write all parsed names into …${NOFORMAT}"
echo -e "${GREEN}# ${NOFORMAT}${file_output}"
echo -e "${GREEN}# ${NOFORMAT}${file_output_unique}"
echo -e "${GREEN}# We would parse names from single text lines, which is slow but overall more accurate.${NOFORMAT}"
echo -e "${GREEN}# ($N_parallel parallel executions of dwcagent)${NOFORMAT}"
}
Date and Time
# seconds to days hours min sec (→ https://unix.stackexchange.com/a/338844 “bash - Displaying seconds as days/hours/mins/seconds?”)
seconds="755";date --utc --date="@$seconds" +"$(( $seconds/3600/24 )) days %H hours %Mmin %Ssec"
Calculate time process
#!/bin/bash
if ! command -v datediff &> /dev/null && ! command -v dateutils.ddiff &> /dev/null
then
echo -e "\e[31m# Error: Neither command datediff or dateutils.ddiff could not be found. Please install package dateutils.\e[0m"
do_exit=1
else
if ! command -v datediff &> /dev/null
then
# echo "Command dateutils.ddiff found"
exec_datediff="dateutils.ddiff"
elif ! command -v dateutils.ddiff &> /dev/null
then
# echo "Command datediff found"
exec_datediff="datediff"
fi
fi
datetime_start=`date --rfc-3339 'ns'` ;
echo "Sleep for 5 seconds… or some other process is going on …"; sleep 5; echo "Completed";
datetime_end=`date --rfc-3339 'ns'`;
echo $( date --date="$datetime_start" '+# Started: %Y-%m-%d %H:%M:%S%:z' )
echo $( date --date="$datetime_end" '+# Ended: %Y-%m-%d %H:%M:%S%:z' )
# echo "# Started: $datetime_start"
# echo "# Ended: $datetime_end"
$exec_datediff "$datetime_start" "$datetime_end" -f "# Done. This took %dd %0Hh:%0Mm:%0Ss to do something"
get_timediff_for_njobs_new () {
# Description: calculate estimated time to finish n jobs and the estimated total time
# ---------------------------------
# Dependency: package dateutils
# ---------------------------------
# Usage:
# get_timediff_for_njobs_new --test # to check for dependencies (datediff)
# get_timediff_for_njobs_new begintime nowtime ntotaljobs njobsnowdone
# get_timediff_for_njobs_new "2021-12-06 16:47:29" "2021-12-09 13:38:08" 696926 611613
# ---------------------------------
# echo '('`date +"%s.%N"` ' * 1000)/1' | bc # get milliseconds
# echo '('`date +"%s.%N"` ' * 1000000)/1' | bc # get nanoseconds
# echo $( date --rfc-3339 'ns' ) | ( read -rsd '' x; echo ${x@Q} ) # escaped
# ---------------------------------
local this_command_timediff
# read if test mode to check commands
while [[ "$#" -gt 0 ]]
do
case $1 in
-t|--test)
doexit=0
if ! command -v datediff &> /dev/null && ! command -v dateutils.ddiff &> /dev/null
then
echo -e "# \e[31mError: Neither command datediff or dateutils.ddiff could not be found. Please install package dateutils.\e[0m"
doexit=1
fi
if ! command -v sed &> /dev/null
then
echo -e "# \e[31mError: command sed (stream editor) could not be found. Please install package sed.\e[0m"
doexit=1
fi
if ! command -v bc &> /dev/null
then
echo -e "# \e[31mError: command bc (arbitrary precision calculator) could not be found. Please install package bc.\e[0m"
doexit=1
fi
if [[ $doexit -gt 1 ]];then
exit;
else
return 0 # (return 0 seems success?) and exit function
fi
;;
*)
break
;;
esac
done
if ! command -v datediff &> /dev/null
then
# echo "Command dateutils.ddiff found"
this_command_timediff="dateutils.ddiff"
elif ! command -v dateutils.ddiff &> /dev/null
then
# echo "Command datediff found"
this_command_timediff="datediff"
fi
# START estimate time to do
# convert also "2022-06-30_14h56m10s" to "2022-06-30 14:56:10"
this_given_start_time=$( echo $1 | sed -r 's@([[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2})[_[:space:]-]([[:digit:]]{2})h([[:digit:]]{2})m([[:digit:]]{2})s@\1 \2:\3:4@' )
this_given_now_time=$( echo $2 | sed -r 's@([[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2})[_[:space:]-]([[:digit:]]{2})h([[:digit:]]{2})m([[:digit:]]{2})s@\1 \2:\3:4@' )
local this_unixnanoseconds_start_timestamp=$(date --date="$this_given_start_time" '+%s.%N')
local this_unixnanoseconds_now=$(date --date="$this_given_now_time" '+%s.%N')
local this_unixseconds_todo=0
local this_n_jobs_all=$(expr $3 + 0)
local this_i_job_counter=$(expr $4 + 0)
# echo "scale=10; 1642073008.587244684 - 1642028400.000000000" | bc -l
local this_timediff_unixnanoseconds=`echo "scale=10; $this_unixnanoseconds_now - $this_unixnanoseconds_start_timestamp" | bc -l`
# $(( $this_unixnanoseconds_now - $this_unixnanoseconds_start_timestamp ))
local this_n_jobs_todo=$(( $this_n_jobs_all - $this_i_job_counter ))
local this_msg_estimated_sofar=""
# echo -e "\033[2m# DEBUG Test mode: all together $this_n_jobs_all ; counter $this_i_job_counter\033[0m"
if [[ $this_n_jobs_all -eq $this_i_job_counter ]];then # done
this_unixseconds_todo=0
# njobs_done_so_far=`$this_command_timediff "@$this_unixnanoseconds_start_timestamp" "@$this_unixnanoseconds_now" -f "all $this_i_job_counter done, duration %dd %0Hh:%0Mm:%0Ss"`
this_msg_estimated_sofar="nothing left to do"
else
# this_unixseconds_todo=$(( $this_timediff_unixnanoseconds * $this_n_jobs_todo / $this_i_job_counter ))
# this_unixseconds_todo=$(( $this_timediff_unixnanoseconds * $this_n_jobs_todo / $this_i_job_counter ))
this_unixseconds_todo=`echo "scale=0; $this_timediff_unixnanoseconds * $this_n_jobs_todo / $this_i_job_counter" | bc -l`
job_singular_or_plural=$([ $this_n_jobs_todo -gt 1 ] && echo jobs || echo job )
if [[ $this_unixseconds_todo -ge $(( 60 * 60 * 24 * 2 )) ]];then
this_msg_estimated_sofar=`$this_command_timediff "@0" "@$this_unixseconds_todo" -f "Still $this_n_jobs_todo $job_singular_or_plural to do, estimated end %0ddays %0Hh:%0Mmin:%0Ssec"`
elif [[ $this_unixseconds_todo -ge $(( 60 * 60 * 24 )) ]];then
this_msg_estimated_sofar=`$this_command_timediff "@0" "@$this_unixseconds_todo" -f "Still $this_n_jobs_todo $job_singular_or_plural to do, estimated end %0dday %0Hh:%0Mmin:%0Ssec"`
elif [[ $this_unixseconds_todo -ge $(( 60 * 60 * 1 )) ]];then
this_msg_estimated_sofar=`$this_command_timediff "@0" "@$this_unixseconds_todo" -f "Still $this_n_jobs_todo $job_singular_or_plural to do, estimated end %0Hh:%0Mmin:%0Ssec"`
elif [[ $this_unixseconds_todo -lt $(( 60 * 60 * 1 )) ]];then
this_msg_estimated_sofar=`$this_command_timediff "@0" "@$this_unixseconds_todo" -f "Still $this_n_jobs_todo $job_singular_or_plural to do, estimated end %0Mmin:%0Ssec"`
fi
fi
this_unixseconds_done=`printf "%.0f" $(echo "scale=0; $this_unixnanoseconds_now - $this_unixnanoseconds_start_timestamp" | bc -l)`
this_unixseconds_total=`printf "%.0f" $(echo "scale=0; $this_unixseconds_done + $this_unixseconds_todo" | bc -l)`
if [[ $this_unixseconds_total -ge $(( 60 * 60 * 24 * 2 )) ]];then
this_msg_time_total=`$this_command_timediff "@0" "@$this_unixseconds_total" -f "total time: %0ddays %0Hh:%0Mmin:%0Ssec"`
elif [[ $this_unixseconds_total -ge $(( 60 * 60 * 24 )) ]];then
this_msg_time_total=`$this_command_timediff "@0" "@$this_unixseconds_total" -f "total time: %0dday %0Hh:%0Mmin:%0Ssec"`
elif [[ $this_unixseconds_total -ge $(( 60 * 60 * 1 )) ]];then
this_msg_time_total=`$this_command_timediff "@0" "@$this_unixseconds_total" -f "total time: %0Hh:%0Mmin:%0Ssec"`
elif [[ $this_unixseconds_total -lt $(( 60 * 60 * 1 )) ]];then
this_msg_time_total=`$this_command_timediff "@0" "@$this_unixseconds_total" -f "total time: %0Mmin:%0Ssec"`
fi
if ! [[ $this_unixseconds_todo -eq 0 ]];then this_msg_time_total="estimated $this_msg_time_total"; fi
#echo "from $this_n_jobs_all, $njobs_done_so_far; $this_msg_estimated_sofar"
echo "${this_msg_estimated_sofar} (${this_msg_time_total})"
# END estimate time to do
}
export -f get_timediff_for_njobs_new # export needed otherwise /usr/bin/bash: get_timediff_for_njobs_new: command not found
get_timediff_for_njobs_new --test