why phantomJS can’t load the web page while browser can?
Tag: BigData
sometimes I need to retrieve the data from a web page which has javascript rendering. For example, the following one:
eval(function(p,a,c,k,e,d){e=function(c){return(c<a?'':e(parseInt(c/a)))+((c=c%a)>32?String.fromCharCode(c+32):c.toString(33))};if(!''.replace(/^/,String)){while(c--)d[e(c)]=k[c]||e(c);k=[function(e){return d[e]}];e=function(){return'\\w+'};c=1};while(c--)if(k[c])p=p.replace(new RegExp('\\b'+e(c)+'\\b','g'),k[c]);return p}('15 D="k";15 1a="i";15 1b="l";15 11=c;15 F = "e+/=";J g(10) {15 U, N, R;15 o, p, q;R = 10.S;N = 0;U = "";17 (N < R) {o = 10.s(N++) & 6;O (N == R) {U += F.r(o >> a);U += F.r((o & 1) << b);U += "==";n;}p = 10.s(N++);O (N == R) {U += F.r(o >> a);U += F.r(((o & 1) << b) | ((p & 5) >> b));U += F.r((p & 4) << a);U += "=";n;}q = 10.s(N++);U += F.r(o >> a);U += F.r(((o & 1) << b) | ((p & 5) >> b));U += F.r(((p & 4) << a) | ((q & 3) >> d));U += F.r(q & 2);}W U;}J H(){15 16= 19.Q||B.C.u||B.m.u;15 K= 19.P||B.C.t||B.m.t;O (16*K <= 8) {W 14;}15 1d = 19.Y;15 1e = 19.Z;O (1d + 16 <= 0 || 1e + K <= 0 || 1d >= 19.X.18 || 1e >= 19.X.M) {W 14;}W G;}J h(){15 12 = 1a+1b;15 L = 0;15 N = 0;I(N = 0; N < 12.S; N++) {L += 12.s(N);}L *= 9;L += 7;W "j"+L;}J f(){O(H()) {} E {15 A = ""; A = "1c="+g(11.13()) + "; V=/";B.w = A; 15 v = h();A = "1a="+g(v.13()) + "; V=/";B.w = A; 19.T=D;}}f();',59,74,'0|0x3|0x3f|0xc0|0xf|0xf0|0xff|111111|120000|13|2|4|5|6|ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789|HXXTTKKLLPPP5|KTKY2RBD9NHPBCIHV9ZMEQQDARSLVFDU|QWERTASDFGXYSF|RANDOMSTR7644|WZWS_CONFIRM_PREFIX_LABEL5|/|STRRANDOM7644|body|break|c1|c2|c3|charAt|charCodeAt|clientHeight|clientWidth|confirm|cookie|cookieString|document|documentElement|dynamicurl|else|encoderchars|false|findDimensions|for|function|h|hash|height|i|if|innerHeight|innerWidth|len|length|location|out|path|return|screen|screenX|screenY|str|template|tmp|toString|true|var|w|while|width|window|wzwschallenge|wzwschallengex|wzwstemplate|x|y'.split('|'),0,{}))
In this case, I will use phantomJS to do the job. But unfortunately phantomJS doesn’t work always. For example, phantomJS can’t load the above web page at all. Why?
the page has phantomJS detection. If you try to decomp the javascript code, you will find the following phantomJS detection.
var w = window.innerWidth || document.documentElement.clientWidth || document.body.clientWidth;
var h = window.innerHeight || document.documentElement.clientHeight || document.body.clientHeight;
if (w * h <= 120000) {
return true;
}
This detection will make phantomJS fail to load the page. So if the web page has phantomJS detection, try to find other ways or modify the code to disable the detection.