Compare commits

18 Commits
v1.0 ... master

Author SHA1 Message Date
fea8ba7ede Merge branch 'dev_package' 2020-10-05 09:13:51 +02:00
d3c98442a9 Added max entropy calculator 2020-10-05 09:11:13 +02:00
b41b13589d Merge pull request #4 from ptrstr/dev_package
Added installation for pip
2020-10-05 08:54:47 +02:00
ptrstr
6fbfea9c9a Added installation for pip 2020-10-03 10:49:26 -04:00
ece4b7f9dd Tried to package the repo under new name and version 2020-09-02 09:12:41 +02:00
c6b4bb6f4f Removed outdated deepsource config 2020-09-02 08:31:12 +02:00
599a105b21 Code Cleanup 2020-07-25 13:27:41 +02:00
7ce9197012 Added Deepsource Analyzer 2020-01-24 23:14:25 +01:00
300bfd9b54 Minor README Update 2020-01-09 23:08:12 +01:00
4eb4395132 Added quotation marks to input strings in README 2019-07-21 11:04:51 +02:00
f3e4276fca Update README.md 2019-07-18 14:25:46 +02:00
0d623b4b31 Update README.md 2019-07-18 14:05:51 +02:00
f85c3120ee Added maximum entropy calculator
- Optimised some of the code
2019-07-18 13:58:45 +02:00
1b6d45b145 Updated Version Number
- This is still a beta release (doesn't work)
2019-07-16 00:16:16 +02:00
d0ba734016 More Prep Module Prep 2019-07-16 00:04:56 +02:00
aa1f3da92f Bugfixes + Renamed to entro_py 2019-07-15 23:48:44 +02:00
710e0b2069 Prepared for packaging 2019-07-15 23:14:42 +02:00
3d3f6a44ac Renamed variables improving compatability 2019-07-15 22:45:10 +02:00
6 changed files with 160 additions and 97 deletions

4
.gitignore vendored Normal file
View File

@@ -0,0 +1,4 @@
.env/
__pycache__/
dist/
*.egg-info/

View File

@@ -1,23 +1,33 @@
# Entropy Calculator
Written in Python, this calculates the information entropy of a given string or file.
Written in Python, this calculates the information entropy and maximum entropy of a given string or file.
## How does this work?
## How does it work?
This is a pretty simple calculator which just uses the negative sum of all the probabilities of the chars in a given string multiplied with the logarithm to the base two of the probabilities. The probabilities of the chars are calculated simply by counting the occurrence divided by the total number of chars.
Mathematically speaking this is -sum(p*log(p)) with p being the probability of a char occurring.
Mathematically speaking this is `-sum(p*log(p))` with p being the probability of a char occurring. The maximum entropy calculation is explained below.
## What is it good for?
Well that is basically up to you. Entropy functions are used in Computer Science mainly for calculation of compression potential or in cryptography. But you are free to use this for whatever you want. I wrote it as preparation for one of my exams.
*Warning:* This can only be used for calculating the entropy of strings (by alphabet). There are however other types like coin tosses of fair or unfair coins (...), but you're gonna have to write calculators for this on your own - for now.
*Warning:* This can only be used for calculating the entropy of strings (by alphabet). There are however other types like coin tosses of fair or unfair coins throws etc. but you're gonna have to write calculators for this on your own - for now.
*Update:* This script now can calculate the maximum entropy now too. This is pretty useful for pre-compression analyses. Maximum entropy is calculated by splitting the alphabet into parts of the same size and calculating the entropy of this, like: `-1 * SIZE_OF_ALPHABET * (DISTINCT_PROBABILITY * log(DISTINCT_PROBABILITY, 2))`.
## Installing
You can install this package easily with `pip`:
```
$ pip install git+https://github.com/creyD/entro.py@dev_package
```
## Usage
You can run as much calculations as you want in one run of the script. For example use it like this with a simple string:
You can run as much calculations as you want in one run of the script. For example use it like this with a simple string (you can skip the quotation marks if you don't have spaces in your string - if you want):
```
entro.py teststring
entro.py "teststring"
```
or this for a file:
@@ -28,18 +38,19 @@ entro.py -files test.txt
or combine both of them:
```
entro.py teststring -files test.txt
entro.py "teststring" -files test.txt
```
Both arguments work with as many strings and filepaths as you want. Just separate them using a space like this:
```
entro.py teststring teststring2 teststring3 -files test1.txt -files test2.txt
entro.py "teststring" "teststring2" teststring3 -files test1.txt -files test2.txt
```
## Command line parameters
### Output Adjustments
`--simple` determines whether the output is explicit like this:
```
@@ -58,9 +69,21 @@ Entropy: 1.5 bits
---
```
### String conversion
`--max` determines whether the output includes the maximum entropy or not, still will show if `--simple` is set:
```
---
Content: TEST
Probabilities: {'T': 0.5, 'E': 0.25, 'S': 0.25}
Entropy: 1.5 bits
Maximum Entropy: 1.584962500721156 bits
---
```
### String Conversion
`--lower` - Converts the input strings to lowercase
`--upper` - Opposite of lower, converts to upper (if both are set only lower will be executed)
`--squash` - Removes all whitespaces from input files (*This doesn't apply to input strings as they will be separated by spaces anyways!*)
`--squash` - Removes all whitespaces from input files (*This only applies to command line inputs if they were surrounded by quotation marks! Otherwise they will get split by the spaces as separate arguments.*)

View File

@@ -1,87 +0,0 @@
'''
calc_entro.py calculates the entropy of a given string or file
This uses the negative sum of the log (to the base of 2) of the probability
times the probability of a char to occur in a certain string as the entropy.
'''
import math
import argparse
# Calculates the entropy of a given string (as described in the docstring)
def calcEntropy(string):
alphabet, alphabet_size, entropy = {}, 0, 0
for char in string:
if char in alphabet:
alphabet[char] += 1
else:
alphabet[char] = 1
alphabet_size += 1
for char in alphabet:
alphabet[char] = alphabet[char] / alphabet_size
entropy += alphabet[char] * math.log(alphabet[char], 2)
return -entropy, alphabet
# Outputs a given entropy including the original text and the alphabet with probabilities
def printEntropy(original, entropy, alphabet, simple):
print('---')
if simple == False:
print('Content: ' + original)
print('Probabilities: ' + str(alphabet))
print('Entropy: ' + str(entropy) + ' bits')
print('---')
# Reads a file by a given path
def getFile(path):
f = open(path, 'r')
content = f.read().replace('\n', ' ')
f.close()
return content.strip()
# List of the arguments one can use to influence the behavior of the program
parser = argparse.ArgumentParser(description='Calculate the information entropy of some strings.')
# INPUT ARGUMENTS
parser.add_argument('strings', nargs='*', default='', type=str, help='Strings to calculate the entropy of.')
parser.add_argument('--files', nargs='*', type=str, default='', help='Provide file path(s) to calculate the entropy of.')
# OUTPUT OPTIONS
parser.add_argument('--simple', nargs='?', type=bool, default=False, help='Determines the explicitness of the output. (True = only entropy shown)')
# CONVERT OPTIONS
parser.add_argument('--lower', nargs='?', type=bool, default=False, help='Converts given strings or textfiles to lowercase before calculating.')
parser.add_argument('--upper', nargs='?', type=bool, default=False, help='Converts given strings or textfiles to uppercase before calculating.')
parser.add_argument('--squash', nargs='?', type=bool, default=False, help='Removes all whitespaces before calculating.')
args = parser.parse_args()
# Prepares the queue of different strings
queue = []
# Add all the provided strings to the list
for string in args.strings:
queue.append(string)
# Add all the provided files to the list
for file in args.files:
string = getFile(file)
queue.append(string)
# Interates over the collected strings and prints the entropies
for string in queue:
if args.lower != False:
string = string.lower()
elif args.upper != False:
string = string.upper()
if args.squash != False:
string = string.replace(" ", "")
a, b = calcEntropy(string)
printEntropy(string, a, b, args.simple)

45
entro_py_min/__main__.py Normal file
View File

@@ -0,0 +1,45 @@
from . import entro_py_min
import argparse
# List of the arguments one can use to influence the behavior of the program
parser = argparse.ArgumentParser('entro_py_min', description='Calculate the information entropy of alphabets.')
# INPUT ARGUMENTS
parser.add_argument('strings', nargs='*', default='', type=str, help='Strings to calculate the entropy of.')
parser.add_argument('--files', nargs='*', type=str, default='', help='Provide file path(s) to calculate the entropy of.')
# OUTPUT OPTIONS
parser.add_argument('--simple', nargs='?', type=bool, default=False, help='Determines the explicitness of the output. (True = only entropy shown)')
parser.add_argument('--max', nargs='?', type=bool, default=False, help='Includes the maximum entropy.')
# CONVERT OPTIONS
parser.add_argument('--lower', nargs='?', type=bool, default=False, help='Converts given strings or textfiles to lowercase before calculating.')
parser.add_argument('--upper', nargs='?', type=bool, default=False, help='Converts given strings or textfiles to uppercase before calculating.')
parser.add_argument('--squash', nargs='?', type=bool, default=False, help='Removes all whitespaces before calculating.')
args = parser.parse_args()
# Prepares the queue of different strings
queue = []
# Add all the provided strings to the list
for string in args.strings:
queue.append(string)
# Add all the provided files to the list
for file in args.files:
string = entro_py_min.readEntropyFile(file)
queue.append(string)
# Interates over the collected strings and prints the entropies
for string in queue:
if args.lower:
string = string.lower()
elif args.upper:
string = string.upper()
if args.squash:
string = string.replace(" ", "")
a, b, c = entro_py_min.calculateEntropy(string)
entro_py_min.printEntropy(string, a, b, args.simple, (False if not args.max else c))

View File

@@ -0,0 +1,57 @@
import math
# Calculates the entropy of a given string
# Returns the entropy and an alphabet with the calculated probabilities
def calculateEntropy(input_string):
alphabet, alphabet_size, entropy = {}, len(input_string), 0
for char in input_string:
if char in alphabet:
alphabet[char] += 1
else:
alphabet[char] = 1
for char in alphabet:
alphabet[char] = alphabet[char] / alphabet_size
entropy -= alphabet[char] * math.log(alphabet[char], 2)
max_entropy = - len(alphabet) * (1 / len(alphabet) * math.log(1 / len(alphabet), 2))
return entropy, alphabet, max_entropy
# Calculates the entropy of a given string
# Returns only the entropy in bits as this is the minimal function
def calculateEntropyMin(input_string):
alphabet, alphabet_size, entropy = {}, len(input_string), 0
for char in input_string:
if char in alphabet:
alphabet[char] += 1
else:
alphabet[char] = 1
for char in alphabet:
i = alphabet[char] / alphabet_size
entropy -= i * math.log(i, 2)
return entropy
# Outputs a given entropy including the original text and the alphabet with probabilities
def printEntropy(original_string, entropy_value, alphabet_dict, simple_bool, max_value):
print('---')
if not simple_bool:
print('Content: ' + original_string)
print('Probabilities: ' + str(alphabet_dict))
print('Entropy: ' + str(entropy_value) + ' bits')
if max_value:
print('Maximum Entropy: ' + str(max_value) + ' bits')
print('---')
# Reads a file by a given path
def readEntropyFile(path_string):
f = open(path_string, 'r')
content = f.read().replace('\n', ' ')
f.close()
return content.strip()

21
setup.py Normal file
View File

@@ -0,0 +1,21 @@
import setuptools
with open("README.md", "r") as fh:
long_description = fh.read()
setuptools.setup(
name="entro_py_min",
version="0.0.1",
author="Conrad Großer",
author_email="grosserconrad@gmail.com",
description="Small Information Entropy Calculator",
long_description=long_description,
long_description_content_type="text/markdown",
url="https://github.com/creyD/entro.py",
packages=setuptools.find_packages(),
classifiers=[
"Programming Language :: Python :: 3",
"License :: OSI Approved :: MIT License",
"Operating System :: OS Independent",
],
)