Hello,

I was inspired by Tom Scott's 7-seg video : https://www.youtube.com/watch?v=zp4BMR88260
In the closing lines he poses a challenge, that if there are two longest words that you can display we'd skip over them
I think i've fixed it : in Node.js :

Code:
var fs = require("fs");
var words = fs.readFileSync("words.txt").toString();
words = words.split("\n");

var badLetters = /[gkmqvwxzio]/;
var currentLongestWord = "";
var longestWords = [];

for (var testWord of words) {
  if(testWord.length <= currentLongestWord.length) {
    continue;
  } else if(testWord.length == currentLongestWord.length) {}

  if(testWord.match(badLetters)) { continue; }
 
  if(testWord.length == currentLongestWord.length) {
    currentLongestWord = testWord;
    longestWords.push(testWord);
  }
  if(testWord.length >= currentLongestWord.length) {
    currentLongestWord = testWord;
  }
}

if (longestWords === undefined || longestWords.length == 0) {
  console.log(currentLongestWord);
} else {
  console.log(longestWords);
}


I think it's working.
But, I want to be able to debug output in an interactive console better then Node.js's REPL, just to make sure

So I went to Ruby, this is where problems arose

Heres my equivalent ruby code :

Code:
words = [ ]
File.open("words.txt", "r") do |f|
  f.each_line do |line|
    words += line.split
  end
end

badLetters = /[gkmqvwxzio]/
currentLongestWord = ""
longestWords = []

for testWord in words do
  if testWord.length <= currentLongestWord.length then
    next
  elsif testWord.length == currentLongestWord.length then
  end

  if testWord.match(badLetters) then
    next
  end
 
  if testWord.length == currentLongestWord.length then
    currentLongestWord = testWord
    longestWords.push(testWord)
  end
  if testWord.length >= currentLongestWord.length then
    currentLongestWord = testWord
  end
end

if longestWords == nil || longestWords.length == 0 then
  puts "#{currentLongestWord}"
else
  puts "#{longestWords}"
end


It should output "supertranscendentness" like the Node.js program does.
But, it doesn't output anything, did I translate anything wrong?

EDIT : I cleaned up the beginning part (that sorts words.txt into an array) to declare what it's doing and made it run a bit faster (as fast as I can make ruby go)

Code:
puts "sorting words ..."
words = File.read("words.txt")
words = words.split("\n")

badLetters = /[gkmqvwxzio]/
currentLongestWord = ""
longestWords = []

puts "finding solution ..."
=begin
*rest of code*
=end


That seemed to fix it

One question, is this an optimal way of solving this problem? I know the code is messy and was originally written in Javascript, Mateo. Apart from those two things, is there a better way to do this?
Algorithmically, the optimal solution examines each word only once but may require storage for every word (if every word is the same length). The rest is just details.

Looking at your code, I'll note that you only need to track the length of the longest word rather than its actual value (store that in the list of current longest words), and you only need to update the current longest if the test word is longer than the current one (saving an assignment).

This seemed like a slightly interesting thing to hack up quickly, so I did an implementation in Rust:
Code:
use std::io::BufRead;

fn is_valid(s: &str) -> bool {
    const BAD_CHARS: &[char] = &['g', 'k', 'm', 'q', 'v', 'w', 'x', 'z', 'i', 'o'];
    !s.contains(BAD_CHARS)
}

fn scan<I: Iterator<Item=String>>(words: I) -> Vec<String> {
    let mut current_len = 0usize;
    let mut current_words = Vec::new();

    for word in words {
        if !is_valid(&&*word) || word.len() < current_len {
            continue;
        }

        if word.len() > current_len {
            current_words.clear();
            current_len = word.len();
        }
        current_words.push(word);
    }

    current_words
}

fn main() {
    let stdin = std::io::stdin();
    let words = stdin.lock().lines().map(|r| r.unwrap());

    for word in scan(words) {
        println!("{}", word);
    }
}
When run against the provided word list this completes in about 1300ms on my system in debug mode, and 45ms when built in release mode with LTO.
Code:
$ time ./target/debug/longwords < words.txt
supertranscendentness
three-and-a-halfpenny
./target/debug/longwords < words.txt  1.32s user 0.00s system 99% cpu 1.330 total
$ time ./target/release/longwords < words.txt
supertranscendentness
three-and-a-halfpenny
./target/release/longwords < words.txt  0.05s user 0.00s system 99% cpu 0.046 total


I started by playing around with a parallel approach, but with only ~450k words parallelizing wasn't cost-effective (presumably the overhead on this small working set overshadowed the gains). What I did there was filter valid words in parallel and collect that into an array, then sort that by length descending (again in parallel) and finally take items out of that array from the head as long as their length was equal to the length of the first item (which is known to be the longest because it's sorted).
For very large sets I'd expect the parallel approach to be faster, even though it's algorithmically suboptimal due to the cost of sorting the filtered array dominating.
Tari wrote:
This seemed like a slightly interesting thing to hack up quickly, so I did an implementation in Rust:
Code:
use std::io::BufRead;

fn is_valid(s: &str) -> bool {
    const BAD_CHARS: &[char] = &['g', 'k', 'm', 'q', 'v', 'w', 'x', 'z', 'i', 'o'];
    !s.contains(BAD_CHARS)
}

fn scan<I: Iterator<Item=String>>(words: I) -> Vec<String> {
    let mut current_len = 0usize;
    let mut current_words = Vec::new();

    for word in words {
        if !is_valid(&&*word) || word.len() < current_len {
            continue;
        }

        if word.len() > current_len {
            current_words.clear();
            current_len = word.len();
        }
        current_words.push(word);
    }

    current_words
}

fn main() {
    let stdin = std::io::stdin();
    let words = stdin.lock().lines().map(|r| r.unwrap());

    for word in scan(words) {
        println!("{}", word);
    }
}


I've never used rust, so here's the ruby code for your version :

Code:
puts "sorting words ..."
words = File.read("words.txt")
words = words.split("\n")

badLetters = /[gkmqvwxzio]/
current_len = 0
current_words = []

puts "finding solution ..."
for word in words do
  if word.match(badLetters) || word.length < current_len then
    next
  end

  if word.length > current_len then
    current_words.clear()
    current_len = word.length
  end
  current_words.push(word)
end

puts "#{current_words}"


rust seems more low-level than ruby, so I had a tough time translating it.

Thanks, Tari!

Edit : I cleaned up the last bit :

Code:
for i in current_words do
  puts "#{i}"
end


instead of :

Code:
puts "#{current_words}"


Edit 2 :
I also made a version for Russian :

Code:
puts "sorting words ..."
words = File.read("words_RU.txt", :encoding => 'utf-8')
words = words.split("\n")

badLetters = /[джёзикмтфхшщьыюяgkmqvwxzio]/
current_len = 0
current_words = []

puts "finding solution ..."
for word in words do
  if word.match(badLetters) || word.length < current_len then
    next
  end

  if word.length > current_len then
    current_words.clear()
    current_len = word.length
  end
  current_words.push(word)
end

for i in current_words do
  puts "#{i}"
end

I am trying to learn Russian, and I thought that was interesting
the words_RU.txt is from a dated github repository of russian words https://github.com/hingston/russian/blob/master/100000-russian-words.txt
This file had junk data, of words that are gibberish, and I manually/automatically filtered it with CyberChef to : https://gist.github.com/Izder456/a823c5939831f4245fd07030199aca3f

the answers are
"целенаправленно" or "tselanapravlenno" (which means "purposefully") and
"спецпереселенец" or "spetspereselenets" (which means "special settler")
double posting I know,
But...I have another question (not super related to the last post) :
There are two ways to read from a file in ruby and make an array with each entry being each line in the original file
(I've denoted the types of doing so with methods "one" & "two")
require 'benchmark'

Code:
def one
  time = Benchmark.measure {
    words = [ ]
    File.open("words.txt", "r") do |f|
      f.each_line do |line|
        words += line.split
      end
    end
    puts "#{words}"
  }
  puts time
end

def two
  time = Benchmark.measure {
    words = File.read("words.txt")
    words = words.split("\n")
    puts "#{words}"
  }
  puts time
end

puts "What type of word sorting?"
puts " 1 : for type one"
puts " 2 : for type two"
puts ""
print ">"
answer = gets.chomp

if answer == "1" then
  one
elsif answer == "2" then
  two
else
  puts "invalid input given"
  puts "run program again"
end


I've run a benchmark on the code :
"two" performs at ~.6 real elapsed time in repl.it &
"one" performs at ~.9 real elapsed time in repl.it

repl.it virtualises an environment in linux x84_64.

Here's that words.txt file in the code if you want to benchmark this on you machine (please do so I get an idea how this performs on real hardware) : https://github.com/dwyl/english-words

Which is the more "ruby" way to do this; in other words, what method is most optimised for the language?

EDIT : I messed up the data :
method one takes 389.754740 real elapsed time &
method two takes 0.761029 real elapsed time

I should of actually checked my numbers sorry bout that
I know very little about Ruby in general, but the latter approach of reading the whole file into memory then splitting it is clearly less efficient. Not only did you measure it to be slower, but it requires twice as much memory because (I assume) it effectively needs to copy every line out of the blob read from a file while splitting.

Sometimes if you know you have a small problem that approach is easier though, since it's simpler to write and the resource requirements aren't always important.
Tari wrote:
I know very little about Ruby in general, but the latter approach of reading the whole file into memory then splitting it is clearly less efficient. Not only did you measure it to be slower, but it requires twice as much memory because (I assume) it effectively needs to copy every line out of the blob read from a file while splitting.

Sometimes if you know you have a small problem that approach is easier though, since it's simpler to write and the resource requirements aren't always important.


wait scratch my measurements, i think I mixed up one and two. imma run the test again

Edit : the results were wrong :
here my updated program :

Code:
require 'benchmark'
def one
  time = Benchmark.measure {
    words = [ ]
    File.open("words.txt", "r") do |f|
      f.each_line do |line|
        words += line.split
      end
    end
  }
  puts ": one :"
  puts time
end

def two
  time = Benchmark.measure {
    words = File.read("words.txt")
    words = words.split("\n")
  }
  puts ": two :"
  puts time
end

#call the functions
one
two


you can run this for yourself at : https://fileopenrb.isaacmeyer.repl.run
to get the results for your system config (cpu, ram, browser, os, browser extensions, etc)
Your run link times out because it's running that on a server, not your computer. Copying the script and running it myself though, indeed the line-by-line approach is much slower; about 180 seconds for the first and well under one second for the second.

It looks like IO.each_line is just incredibly inefficient, which surprises me; String.split on the other hand is much more efficient, judging from strace-ing each. The second reads the whole file, does a little memory allocation and completes:

Code:
openat(AT_FDCWD, "words.txt", O_RDONLY|O_CLOEXEC) = 5
ioctl(5, TCGETS, 0x7fff95aa8600)        = -1 ENOTTY (Inappropriate ioctl for device)
fstat(5, {st_mode=S_IFREG|0644, st_size=4863005, ...}) = 0
lseek(5, 0, SEEK_CUR)                   = 0
mmap(NULL, 4866048, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f3af9090000
read(5, "2\n1080\n&c\n10-point\n10th\n11-point"..., 4863005) = 4863005
mremap(0x7f3af9090000, 4866048, 4874240, MREMAP_MAYMOVE) = 0x7f3af8bea000
read(5, "", 8192)                       = 0
mremap(0x7f3af8bea000, 4874240, 4866048, MREMAP_MAYMOVE) = 0x7f3af8bea000
close(5)                                = 0
brk(0x5575f57c0000)                     = 0x5575f57c0000
brk(0x5575f57e4000)                     = 0x5575f57e4000
brk(0x5575f5808000)                     = 0x5575f5808000
brk(0x5575f5831000)                     = 0x5575f5831000
brk(0x5575f5854000)                     = 0x5575f5854000
brk(0x5575f588a000)                     = 0x5575f588a000
brk(0x5575f58ac000)                     = 0x5575f58ac000
mmap(NULL, 303104, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f3af94ea000
brk(0x5575f58d0000)                     = 0x5575f58d0000
brk(0x5575f58f4000)                     = 0x5575f58f4000
mremap(0x7f3af94ea000, 303104, 454656, MREMAP_MAYMOVE) = 0x7f3af947b000
brk(0x5575f5918000)                     = 0x5575f5918000
brk(0x5575f593c000)                     = 0x5575f593c000
brk(0x5575f5960000)                     = 0x5575f5960000
brk(0x5575f5984000)                     = 0x5575f5984000
brk(0x5575f59a8000)                     = 0x5575f59a8000
brk(0x5575f59cc000)                     = 0x5575f59cc000
...
mremap(0x7f3af914f000, 3448832, 5173248, MREMAP_MAYMOVE) = 0x7f3af86fb000
brk(0x5575f6b38000)                     = 0x5575f6b38000
brk(0x5575f6b5c000)                     = 0x5575f6b5c000
brk(0x5575f6b80000)                     = 0x5575f6b80000
brk(0x5575f6ba4000)                     = 0x5575f6ba4000
brk(0x5575f6bc8000)                     = 0x5575f6bc8000
brk(0x5575f6bec000)                     = 0x5575f6bec000
brk(0x5575f6c10000)                     = 0x5575f6c10000
brk(0x5575f6c34000)                     = 0x5575f6c34000
brk(0x5575f6c58000)                     = 0x5575f6c58000
brk(0x5575f6c7c000)                     = 0x5575f6c7c000

So it reads the whole file in one go, then does a not-unreasonable number of memory allocations for the split array (looks like it's growing the storage by increasing powers of two based on the sizes passed to mremap()).

The second approach's syscall trace is much uglier, following this pattern repeatedly:

Code:
read(5, "ss\ncanting\ncantingly\ncantingness"..., 8192) = 8192
brk(0x5645afedb000)                     = 0x5645afedb000
brk(0x5645aff4e000)                     = 0x5645aff4e000
brk(0x5645affc1000)                     = 0x5645affc1000
brk(0x5645b0034000)                     = 0x5645b0034000
brk(0x5645b00a7000)                     = 0x5645b00a7000
brk(0x5645b011a000)                     = 0x5645b011a000
brk(0x5645b018c000)                     = 0x5645b018c000
brk(0x5645b01ff000)                     = 0x5645b01ff000
brk(0x5645b011a000)                     = 0x5645b011a000
brk(0x5645b0034000)                     = 0x5645b0034000
brk(0x5645aff4e000)                     = 0x5645aff4e000
brk(0x5645afe68000)                     = 0x5645afe68000
mmap(NULL, 475136, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f7600201000
mmap(NULL, 475136, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f760018d000
mmap(NULL, 475136, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f7600119000
mmap(NULL, 475136, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f76000a5000
mmap(NULL, 475136, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f7600031000
mmap(NULL, 475136, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f75fffbd000
mmap(NULL, 475136, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f75fff49000
munmap(0x7f75fff49000, 475136)          = 0
munmap(0x7f75fffbd000, 475136)          = 0
munmap(0x7f7600031000, 475136)          = 0
munmap(0x7f76000a5000, 475136)          = 0
munmap(0x7f7600119000, 475136)          = 0
munmap(0x7f760018d000, 475136)          = 0
munmap(0x7f7600201000, 475136)          = 0
It reads a short block of the file, then allocates a bunch of memory in the data section (for words?) and also allocates several big blocks via mmap that it frees again quickly.

Doing a large number of short reads is somewhat less efficient that one large one (but obviously requires more memory), but Ruby doing something extremely inefficient when iterating over lines appears to be a more dire problem. Capturing more information from each:

Code:
> /usr/bin/time -v ruby split.rb
: one :
141.071526  37.621011 178.692537 (181.985366)
        Command being timed: "ruby split.rb"
        User time (seconds): 141.13
        System time (seconds): 37.63
        Percent of CPU this job got: 98%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 3:02.11
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 164784
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 25
        Minor (reclaiming a frame) page faults: 25868806
        Voluntary context switches: 137
        Involuntary context switches: 17032
        Swaps: 0
        File system inputs: 7912
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0
> /usr/bin/time -v ruby split.rb
: two :
  0.065714   0.019682   0.085396 (  0.085799)
        Command being timed: "ruby split.rb"
        User time (seconds): 0.10
        System time (seconds): 0.03
        Percent of CPU this job got: 98%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.14
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 43288
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 9718
        Voluntary context switches: 1
        Involuntary context switches: 17
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0
The first uses about 40 seconds of "system" time which is probably not all accountable against doing short I/Os, but partially- it did a total of ~8k reads which isn't too awful.

More shockingly, line by line processing requires a lot more memory (peak 164 MB against split()'s 43 MB) and generates around 10 thousand times more page faults (probably relating to small memory allocations). The user time on the each_line one is also very high; combined with the huge amount of memory churn it seems to be doing something extremely inefficient with memory management.


Just for kicks, I did the same timing on my Rust version:

Code:
> /usr/bin/time -v ./target/release/longwords < words.txt
supertranscendentness
three-and-a-halfpenny
        Command being timed: "./target/release/longwords"
        User time (seconds): 0.04
        System time (seconds): 0.00
        Percent of CPU this job got: 86%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.05
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 1748
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 1
        Minor (reclaiming a frame) page faults: 89
        Voluntary context switches: 7
        Involuntary context switches: 7
        Swaps: 0
        File system inputs: 1936
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

Very low memory use!
Tari wrote:
*lots of info*

Thank you for your in depth post
by nature, ruby is very slow, so i've tried putting it in Crystal, which has the strong syntax of ruby, but boasts it's c-like speed. Keep in mind this is still a beta and i'm using a very old version for stabilty reasons (Crystal 0.27.2 [60760a546] (2019-02-05))
here's the code if you don't mind (in crystal)

Code:
require "benchmark"

def one
  puts "Testing : Type one"
  time = Benchmark.measure {
    words = %w()
    File.open("words.txt", "r") do |f|
      f.each_line do |line|
        words += line.split
      end
    end
  }
  puts "Results :"
  puts time
end

def two
  puts "Testing : Type two"
  time = Benchmark.measure {
    words = File.read("words.txt")
    words = words.split("\n")
  }
  puts "Results :"
  puts time
end

#call the functions
one
two


and the updated equivalent in ruby :

Code:
require 'benchmark'

def one
  puts "Testing : Type one"
  time = Benchmark.measure {
    words = [ ]
    File.open("words.txt", "r") do |f|
      f.each_line do |line|
        words += line.split
      end
    end
  }
  puts "Results :"
  puts time
end

def two
  puts "Testing : Type two"
  time = Benchmark.measure {
    words = File.read("words.txt")
    words = words.split("\n")
  }
  puts "Results :"
  puts time
end

#call the functions
one
two


this way it runs both methods no matter what

..but you can change the calling functions to do what you want it to, the require "benchmark" is ruby's built in benchmarking module that works pretty well.
you can read up on it here : https://ruby-doc.org/stdlib-2.5.0/libdoc/benchmark/rdoc/Benchmark.html
Well the compiled version seems to be a little lighter on memory (still not very good), and no faster. I'd guess it uses basically the same implementation under the hood, just skipping the interpreter.

Code:
> crystal build split.rb
> /usr/bin/time -v ./split
Testing : Type one
Results :
  524.181255   18.941093   543.122348 (  292.872190)
Testing : Type two
Results :
  0.144294   0.006565   0.150859 (  0.154710)
        Command being timed: "./split"
        User time (seconds): 524.32
        System time (seconds): 18.95
        Percent of CPU this job got: 185%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 4:53.03
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 59716
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 109450
        Voluntary context switches: 3838436
        Involuntary context switches: 73242
        Swaps: 0
        File system inputs: 4880
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

And whatever it's doing under the hood is insane (or doesn't strace well):

Code:
[pid 2147967] openat(AT_FDCWD, "words.txt", O_RDONLY|O_CLOEXEC) = 5
[pid 2147967] fcntl(5, F_GETFL)         = 0x8000 (flags O_RDONLY|O_LARGEFILE)
[pid 2147967] mmap(0x7f641a778000, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6419758000
[pid 2147967] read(5, "2\n1080\n&c\n10-point\n10th\n11-point"..., 8192) = 8192
[pid 2147967] mmap(0x7f6419768000, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6419748000
[pid 2147967] mmap(0x7f6419758000, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6419738000
[pid 2147967] mmap(0x7f6419748000, 69632, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6419727000
[pid 2147967] rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
[pid 2147967] futex(0x7f641c21bae8, FUTEX_WAKE_PRIVATE, 2147483647) = 3
[pid 2147970] <... futex resumed>)      = 0
[pid 2147969] <... futex resumed>)      = 0
[pid 2147968] <... futex resumed>)      = 0
[pid 2147970] futex(0x7f641c21b6c0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 2147969] futex(0x7f641c21b6c0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 2147970] <... futex resumed>)      = 0
[pid 2147969] <... futex resumed>)      = 0
[pid 2147968] futex(0x7f641c21b6c0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 2147967] futex(0x7f641c21baec, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 2147970] futex(0x7f641c21baec, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 2147969] futex(0x7f641c21baec, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 2147968] <... futex resumed>)      = 0
[pid 2147968] futex(0x7f641c21baec, FUTEX_WAKE_PRIVATE, 2147483647 <unfinished ...>
[pid 2147967] <... futex resumed>)      = 0
[pid 2147970] <... futex resumed>)      = 0
[pid 2147969] <... futex resumed>)      = 0
[pid 2147968] <... futex resumed>)      = 3
[pid 2147967] futex(0x7f641c21b6c0, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 2147970] futex(0x7f641c21b6c0, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 2147969] futex(0x7f641c21b6c0, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 2147968] futex(0x7f641c21b6c0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 2147967] <... futex resumed>)      = -1 EAGAIN (Resource temporarily unavailable)
[pid 2147970] <... futex resumed>)      = -1 EAGAIN (Resource temporarily unavailable)
[pid 2147969] <... futex resumed>)      = -1 EAGAIN (Resource temporarily unavailable)
[pid 2147968] <... futex resumed>)      = 0
Those futex waits across a couple threads continue as long as I care to watch the output, which is.. weird. I suppose it's related to the language using cooperative userspace threading (Go-style).

I tried building an optimized binary too (crystal build --release) but it didn't make a significant difference to runtime. In fact, the one run I did with the optimized binary was slower! There's something amazingly inefficient happening inside the libraries that makes it very very slow to read the file line-at-a-time, and that inefficiency appears to be inherited directly from the Ruby standard library in Crystal.
Tari wrote:
Well the compiled version seems to be a little lighter on memory (still not very good), and no faster. I'd guess it uses basically the same implementation under the hood, just skipping the interpreter.

Yeah, crystal is still in the works and i didn't really expect much

Tari wrote:
I tried building an optimized binary too (crystal build --release) but it didn't make a significant difference to runtime. In fact, the one run I did with the optimized binary was slower! There's something amazingly inefficient happening inside the libraries that makes it very very slow to read the file line-at-a-time, and that inefficiency appears to be inherited directly from the Ruby standard library in Crystal.

Maybe crystal is held back by it's ruby compatibility in libs? Hm...strange.
  
Register to Join the Conversation
Have your own thoughts to add to this or any other topic? Want to ask a question, offer a suggestion, share your own programs and projects, upload a file to the file archives, get help with calculator and computer programming, or simply chat with like-minded coders and tech and calculator enthusiasts via the site-wide AJAX SAX widget? Registration for a free Cemetech account only takes a minute.

» Go to Registration page
Page 1 of 1
» All times are GMT - 5 Hours
 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

 

Advertisement