I was asked to write some code for our schools acm contest teams. I have to write code that does string matching. When given a string and a regular expression I have to return a pointer to that position in the string. The regular expressions will only use | , * , and +. The pipe was easy enough and I was wondering if someone could evaluate the following pseudo code to make sure that I was on the right track as far the * operator goes.


Code:

//The three parts shall be stored in a vector for example a *b would be
//A vector containing [a, ,b]
regex_star:
  position1 = position(array[0] in input_string)
  position2 = position(array[2] in input_string)

  if they are not in string:
    return -1
  else:
    substring=sub(input_string, position1, position2)
    //If there are not any array[1] in substring then substring.length()
    //Should be length(array[0]) + length(array[2])
    if lengths are equal:
      return position1
    for i =0; i < input_string.length; i++:
      if substring.at(i) != array[1]:
        return -1

  return position1;

end regex_star


Edit: So this does work. But there is one test case that I did not account for. What if array[2] is before array[0] in the string. This should have a simple solution though.

Code:

compare position1 and position2:
  if position1 > position2:
    recursive find
    if none found:
      doesn't exist return 1
    else if found:
       position2 = new found position
       break
Alright here is some working code thus far. The only real thing left to do is write the regex string parser and add some form of recursion to allow for more complex strings such as (a|b)*t.


Code:

#include <iostream>
#include <string>
#include <vector>

using namespace std;

//Returns the position of the string if it is found, if it is not found then it will return -1
int regex_pipe(vector<string> pipe_search, string input_string) {
    //Must be of type size_t because that is the return value of the find function
    size_t string_found;
    //Each element in the vector contains an element found in the pipe
    for(int i = 0; i < pipe_search.size(); i++) {
        //Assigns the position if found to string_found.  If not found string found will equal string::npos
        string_found = input_string.find(pipe_search[i]);
        if(string_found != string::npos)
           return string_found;
    }
    return -1;
}

int regex_star(vector<string> star_search, string input_string) {
    size_t string_found;
    int position_one;
    int position_two;
    string substring;

    //Searches the string to see if the first part of the pattern exists
    //If it does not then the function exits with -1
    string_found = input_string.find(star_search[0]);
    if(string_found != string::npos)
        position_one = string_found;
    else
       return -1;

    //Searches the string to see if the second part of the pattern exists, anywhere after the first element
    //If it does not then the function exits with -1
    string_found = input_string.find(star_search[2], position_one);
    if(string_found != string::npos)
        position_two = string_found;
    else
        return -1;

    //Grabs a substring from position_one to the end of position_two
    substring = input_string.substr(position_one, (position_two - position_one) + 1);
    //If the length of the substring is the same length as elements 1 and 3 of the array then
    //The pattern will match because they will be side by side
    if(substring.length() == (star_search[0].length()+star_search[2].length()))
        return position_one;

    //Searches through the string making sure that only the element before the star is between them
    //If it isn't then it will return -1
    for(int i = star_search[0].length(); i < (substring.length() - star_search[2].length()); i++) {
        substring = substring[i];
        if(substring != star_search[1])
            return -1;
    }

    //Since the function has yet to leave, the answer will be at position_one
    return position_one;

}

int regex_plus(vector<string> plus_search, string input_string) {
    size_t string_found;
    int position_one;
    int position_two;
    string substring;
    string inbetween;

    //Searches the string to see if the first part of the pattern exists
    //If it does not then the function exits with -1
    string_found = input_string.find(plus_search[0]);
    if(string_found != string::npos)
        position_one = string_found;
    else
        return -1;

    //Searches the string to see if the second part of the pattern exists anywhere after position_one
    //If it does not then the function exits with -1
    string_found = input_string.find(plus_search[2], position_one);
    if(string_found != string::npos)
        position_two = string_found;
    else
        return -1;

    //Grabs a substring from beggining of position_one to the end of position_two
    substring = input_string.substr(position_one, (position_two - position_one) +1);

    //Checks to see if the substring is of length of every element of plus_search combined
    //If it is then it will only have one character inbetween it that is the search term
    if(substring.length() == (plus_search[0].length()+plus_search[1].length()+plus_search[2].length())) {
        inbetween = substring.at(substring.length() - plus_search[2].length() - plus_search[0].length());
        if(inbetween == plus_search[1])
            return position_one;
    }

    //Loops through making sure everything between the two is the same as plus[search[1] if
    //Not then the function will exit with -1
    for(int i = plus_search[0].length(); i < (substring.length() - plus_search[2].length());i++) {
        substring = substring[i];
        if(substring != plus_search[1])
            return -1;
    }

    //If the function makes it this far then the string exists and starts at position_one
    return position_one;
}

int main(int argc, char** argv) {
    //Main function with the vectors already setup, the actually parsing
    //Of the regex string is not yet complete

    vector<string> pipe_parser;
    vector<string> star_parser;
    vector<string> plus_parser;
    string my_string = "This is a test string";
    string pipe_search_string = "b | a";
    string star_search_string = "a *t";
    string plus_search_string = "a +t";

    //After parsing this is what pipe_parser vector should hold for pipes
    pipe_parser.push_back("a");
    pipe_parser.push_back("b");

    //After parsing this is what star_parser vector should hold for stars
    star_parser.push_back("a");
    star_parser.push_back(" ");
    star_parser.push_back("t");

    //After parsing this is what plus_parser vector should hold for pluses
    plus_parser.push_back("a");
    plus_parser.push_back(" ");
    plus_parser.push_back("t");

    cout << "Pipe" << endl;
    cout << "---------------------------------" << endl;
    cout << "Should be found at pos: 8" << endl;
    cout << "Really found at: " << regex_pipe(pipe_parser, my_string) << endl;

    cout << endl;
    cout << "Star" << endl;
    cout << "---------------------------------" << endl;
    cout << "Should be found at pos: 8" << endl;
    cout << "Really found at: " << regex_star(star_parser, my_string) << endl;

    cout << endl;
    cout << "Plus" << endl;
    cout << "--------------------------------" << endl;
    cout << "Should be found at pos: 8" << endl;
    cout << "Really found at: " << regex_plus(plus_parser, my_string) << endl;
}


Well I have made yet another error, after running some tests, when searching for the word is in the string "This is a test", the letters is in This are found, because find does a character based search. I have now come up with some code that can perform a word based search, and just have to incorporate it. Other than that I believe everything is working just fine.
So I went to my proffessors office today and asked him to look at the code and he said he had and I was doing it wrong. After telling me I was wrong he introduced me to man 7 regex and man 3 regex. From there we wrote a very simple api to take care of what we needed and the problem is solved.
Ah, I was going to mention a few conceptual errors I spotted in your algorithm. Smile Did you therefore end up just wrapping the standard regex parser, or did you end up implementing your own from the spec?
We just wrote a wrapper for it, which took about 20 minutes.
  
Register to Join the Conversation
Have your own thoughts to add to this or any other topic? Want to ask a question, offer a suggestion, share your own programs and projects, upload a file to the file archives, get help with calculator and computer programming, or simply chat with like-minded coders and tech and calculator enthusiasts via the site-wide AJAX SAX widget? Registration for a free Cemetech account only takes a minute.

» Go to Registration page
Page 1 of 1
» All times are UTC - 5 Hours
 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

 

Advertisement