this post was submitted on 30 Jan 2024
13 points (100.0% liked)

Technology

37737 readers
413 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:


This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 2 years ago
MODERATORS
 

The sha1 hash for 64test64xa is 6779c53432b8badf049bb9d8924a5785dd887243 which is 41 characters only using hexadecimal, 10digits and 6letters. But how long it would be if it was using the whole 26 letters in the latin alphabet? What if it also differentiated between UPPER and lower cases?

top 15 comments
sorted by: hot top controversial new old
[–] [email protected] 31 points 9 months ago

So, hexadecimal uses 16 characters. Each character stores 4 bits of data (2⁴ = 16).

If you use the 10 digits and 26 letters of the Latin alphabet, the resulting encoding is called Base36.

It is a rather impractical format for storing data, though, because for purposes of simple conversion, the number of possibilities should be a power of 2 -- that way a program can do (quick) bit shifts instead of (difficult, especially on big numbers) division to determine which character to use. That's why it's mostly used to encode numbers, and not large sequences of data.

Base32 is a slightly-smaller variant that can fit 5 bits of data into one character. (2⁵ = 32)

If you add up digits, uppercase and lowercase characters together (differentiating between upper and lower case), you get 62. This is also an impractical number for computer purposes. But add two extra characters and you get 64, which is another nice power of two (2⁶ = 64), letting one character store 6 bits. And Base64 is a common encoding scheme for data.


And when you know how many bits a character can fit, you can calculate how "efficient" the encoding will be and how many characters will be needed to store data. A Base32 encoding will need 20% fewer characters than hexadecimal, and Base64 needs 33.3% fewer.

[–] [email protected] 9 points 9 months ago* (last edited 9 months ago)
  1. It's a 160-bit hash, so using letters and numbers, it'd be log base (10+26) of 2^160, which is roughly 31. So 31 letters.
  2. Using upper and lower case, it'd be log base (10+26+26) of 2^160, or 27 letters.
  3. Don't use SHA-1; use SHA-256
  4. Upper and lower case to represent SHA-256 would be log base (10+26+26) of 2^256, 43 letters
  5. Internally, it's represented using 32 "letters" of 8 bits each, effectively using every possible ASCII character. The string representation is only of consequence when you're exchanging it over a medium where it needs to be robust and human-readable, and probably the benefit from squeezing it down to fewer characters for that representation is not worth the cost in terms of making it unclear how you've chosen to squeeze it and making life difficult for people who are trying to convert to and from the format. Hexadecimal is a little bigger but it's very clear and unambiguous what you've done, whereas using the full alphabet doesn't have that property.
[–] [email protected] 9 points 9 months ago (1 children)

You could try base64 maybe? The above would be: Z3nFNDK4ut8Em7nYkkpXhd2IckM= (28 chrs)

base64 uses A-Z, a-z, 0-9, and the + and / characters to encode 6 bits per character. That means you can encode every 3 bytes (or 6 hex) in 4 characters (since 3 * 8 bits = 4 * 6 bits). If the data are not a perfect multiple of 3 bytes, the last group of 4 characters gets padded out with = signs.

[–] [email protected] 2 points 9 months ago* (last edited 9 months ago)

There's also a Base64URL variant that is a little more friendly in the modern world where the +/= often need escape sequences.

The first two are replaced with more sensible characters and the third is just removed entirely - do you really need padding?

[–] [email protected] 5 points 9 months ago (1 children)

It gets subtle when you consider Unicode. But you said latin alphabet, so you can look at just the UTF-8 section of this table, and assume 1byte = 1letter.

https://github.com/qntm/base32768#base32768

HTH

[–] [email protected] 9 points 9 months ago

I think we should consider Unicode, I want hashes that look like lau52gj🍀pr18e🍅

[–] [email protected] 4 points 9 months ago* (last edited 9 months ago)

https://www.unitconverters.net/numbers/decimal-to-base-36.htm

base 10 = 590741618446309885662238049322513167918815539779

base 16 = 6779C53432B8BADF049BB9D8924A5785DD887243

base 36 = C34WAO39N9K9XWPHW5W9XGRH0AHT0CG

[–] [email protected] 3 points 9 months ago* (last edited 9 months ago)

Hashing won't fix sloppy typos/grammar.

How much would hash ~~digsets~~ digests be shortened if the whole ~~alphabel~~ alphabet was used ?

[–] [email protected] 2 points 9 months ago (2 children)

Yeah you can always take a hex hash output and convert it to Base64...which does conpress it significantly. Apply LZ Compression and boom.

[–] [email protected] 2 points 9 months ago

Apply LZ Compression and boom.

That would produce a binary stream. If that's what OP wants, they could just leave the original hash in binary. And that would be unlikely to compress any further since hashes are, by their nature, high entropy already.

[–] sarmale 1 points 9 months ago* (last edited 9 months ago) (1 children)

Tried to convert to base 64 and.. it actually makes it longer. Why?

[–] [email protected] 3 points 9 months ago (1 children)

You didn't convert a hex number into Base64, you Base64 encoded the hex string.

TL;DR, you used the wrong tool.

[–] sarmale 1 points 9 months ago (2 children)

Whats the right tool? Cant seem to find one

[–] [email protected] 3 points 9 months ago

If you're using Linux (or macOS or MinGW or CygWin or MSYS), you can do something like this in the terminal:

xxd -r -ps | base64

The first command will read the standard input and decode hex strings back into raw data, and the second one will do base64 to the output.

If I pass the hex string mentioned in your original post through this command, I get:

Z3nFNDK4ut8Em7nYkkpXhd2IckM=
[–] [email protected] 1 points 9 months ago

https://ciphereditor.com/share#blueprint=eyJ0eXBlIjoiYmx1ZXByaW50IiwicHJvZ3JhbSI6eyJ0eXBlIjoicHJvZ3JhbSIsIm9mZnNldCI6eyJ4IjozMjQsInkiOjB9LCJmcmFtZSI6eyJ4IjotMTYwLCJ5IjotOTYsIndpZHRoIjozMjAsImhlaWdodCI6MTkyfSwiY2hpbGRyZW4iOlt7InR5cGUiOiJwcm9ncmFtIiwib2Zmc2V0Ijp7IngiOjAsInkiOi0yMDR9LCJmcmFtZSI6eyJ4IjoxNjMsInkiOi05NSwid2lkdGgiOjMyMCwiaGVpZ2h0Ijo0OH0sImxhYmVsIjoiSGFzaCBCYXNlNjQiLCJjaGlsZHJlbiI6W3sidHlwZSI6Im9wZXJhdGlvbiIsIm5hbWUiOiJAY2lwaGVyZWRpdG9yL2V4dGVuc2lvbi1oYXNoL2hhc2giLCJleHRlbnNpb25VcmwiOiJodHRwczovL2Nkbi5jaXBoZXJlZGl0b3IuY29tL2V4dGVuc2lvbnMvQGNpcGhlcmVkaXRvci9leHRlbnNpb24taGFzaC8xLjAuMC1hbHBoYS4xL2V4dGVuc2lvbi5qcyIsInByaW9yaXR5Q29udHJvbE5hbWVzIjpbImhhc2giLCJhbGdvcml0aG0iLCJtZXNzYWdlIl0sImZyYW1lIjp7IngiOi02MDksInkiOi00NjIsIndpZHRoIjozMjAsImhlaWdodCI6NDcwfSwiaW5pdGlhbEV4ZWN1dGlvbiI6dHJ1ZSwiY29udHJvbHMiOnsibWVzc2FnZSI6eyJ2aXNpYmlsaXR5IjoiZXhwYW5kZWQifSwiYWxnb3JpdGhtIjp7InZhbHVlIjoic2hhMy01MTIiLCJ2aXNpYmlsaXR5IjoiZXhwYW5kZWQifSwiaGFzaCI6eyJpZCI6IjYiLCJ2YWx1ZSI6eyJ0eXBlIjoiYnl0ZXMiLCJkYXRhIjoiTkRBNFpEazBNemcwTWpFMlpqZzVNR1ptTjJFd1l6TTFNamhsT0dKbFpERmxNR0l3TVRZeU1RPT0ifSwidmlzaWJpbGl0eSI6ImV4cGFuZGVkIn19fSx7InR5cGUiOiJvcGVyYXRpb24iLCJuYW1lIjoiQGNpcGhlcmVkaXRvci9leHRlbnNpb24tZXNzZW50aWFscy9iaW5hcnktdG8tdGV4dCIsImV4dGVuc2lvblVybCI6Imh0dHBzOi8vY2RuLmNpcGhlcmVkaXRvci5jb20vZXh0ZW5zaW9ucy9AY2lwaGVyZWRpdG9yL2V4dGVuc2lvbi1lc3NlbnRpYWxzLzEuMC4wLWFscGhhLjEvZXh0ZW5zaW9uLmpzIiwicHJpb3JpdHlDb250cm9sTmFtZXMiOlsiZGF0YSIsImFscGhhYmV0IiwicGFkZGluZyIsImVuY29kZWREYXRhIl0sImZyYW1lIjp7IngiOi0xNTgsInkiOi00NzAsIndpZHRoIjozMjAsImhlaWdodCI6NTY2fSwiaW5pdGlhbEV4ZWN1dGlvbiI6dHJ1ZSwiY29udHJvbHMiOnsiZGF0YSI6eyJpZCI6IjkiLCJ2YWx1ZSI6eyJ0eXBlIjoiYnl0ZXMiLCJkYXRhIjoiTkRBNFpEazBNemcwTWpFMlpqZzVNR1ptTjJFd1l6TTFNamhsT0dKbFpERmxNR0l3TVRZeU1RPT0ifSwidmlzaWJpbGl0eSI6ImV4cGFuZGVkIn0sImFscGhhYmV0Ijp7InZpc2liaWxpdHkiOiJleHBhbmRlZCJ9LCJwYWRkaW5nIjp7InZpc2liaWxpdHkiOiJleHBhbmRlZCJ9LCJlbmNvZGVkRGF0YSI6eyJ2aXNpYmlsaXR5IjoiZXhwYW5kZWQifX19LHsidHlwZSI6InZhcmlhYmxlIiwiYXR0YWNobWVudHMiOlsiOSIsIjYiXX1dfV19fQ